Efficiently Determining Condition Relevant Modifiable Lifestyle Attributes

ABSTRACT

A bioinformatics method, software, system and database are presented in which attribute profiles containing pangenetic and non-pangenetic information are created and used to identify sets of attributes, including lifestyle attributes that are associated with a condition. The lifestyle attributes which can be modified and which result in a significant change in the probability of having the condition are identified and used as the basis for recommendations for lifestyle modification.

This application is a continuation of U.S. patent application Ser. No.12/031,669, filed Feb. 14, 2008, entitled Efficiently CompilingCo-associating Bioattributes, which claims priority to U.S. ProvisionalApplication Ser. No. 60/895,236, which was filed on Mar. 16, 2007, andwhich is incorporated herein by reference in its entirety.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description will be better understood when readin conjunction with the appended drawings, in which there is shown oneor more of the multiple embodiments of the present invention. It shouldbe understood, however, that the various embodiments are not limited tothe precise arrangements and instrumentalities shown in the drawings.

FIG. 1 illustrates attribute categories and their relationships;

FIG. 2 illustrates a system diagram including data formatting,comparison, and statistical computation engines and dataset input/outputfor a method of creating an attribute combinations database;

FIG. 3 illustrates examples of genetic attributes;

FIG. 4 illustrates examples of epigenetic attributes;

FIG. 5 illustrates representative physical attributes classes;

FIG. 6 illustrates representative situational attributes classes;

FIG. 7 illustrates representative behavioral attributes classes;

FIG. 8 illustrates an attribute determination system;

FIG. 9 illustrates an example of expansion and reformatting ofattributes;

FIG. 10 illustrates the advantage of identifying attribute combinationsin a two attribute example;

FIG. 11 illustrates the advantage of identifying attribute combinationsin a three attribute example;

FIG. 12 illustrates an example of statistical measures & formulas usefulfor the methods;

FIG. 13 illustrates a flow chart for a method of creating an attributecombinations database;

FIG. 14 illustrates a 1st dataset example for a method of creating anattribute combinations database;

FIG. 15 illustrates 2nd dataset and combinations table examples for amethod of creating an attribute combinations database;

FIG. 16 illustrates a 3rd dataset example for a method of creating anattribute combinations database;

FIG. 17 illustrates a 4th dataset example for a method of creating anattribute combinations database;

FIG. 18 illustrates a 4th dataset example for a method of creating anattribute combinations database;

FIG. 19 illustrates a flowchart for a method of identifying predisposingattribute combinations;

FIG. 20 illustrates a rank-ordered tabulated results example for amethod of identifying predisposing attribute combinations;

FIG. 21 illustrates a flowchart for a method of predispositionprediction;

FIG. 22 illustrates 1st and 2nd dataset examples for a method ofpredisposition prediction;

FIG. 23 illustrates 3rd dataset and tabulated results examples for amethod of predisposition prediction;

FIG. 24 illustrates a flowchart for a method of destiny modification;

FIG. 25 illustrates 1st dataset, 3rd dataset and tabulated resultsexamples for destiny modification of individual #113;

FIG. 26 illustrates 1st dataset, 3rd dataset and tabulated resultsexamples for destiny modification of individual #114;

FIG. 27 illustrates a flowchart for a method of predispositionmodification;

FIG. 28 illustrates a flowchart for a method of genetic attributeanalysis;

FIG. 29 illustrates 3rd dataset examples from a method of destinymodification for use in synergy discovery;

FIG. 30 illustrates one embodiment of a computing system on which thepresent method and system can be implemented; and

FIG. 31 illustrates a representative deployment diagram for an attributedetermination system.

DETAILED DESCRIPTION

Disclosed herein are methods, computer systems, databases and softwarefor identifying combinations of attributes associated with individualsthat co-occur (i.e., co-associate, co-aggregate) with attributes ofinterest, such as specific disorders, behaviors and traits. Disclosedherein are databases as well as database systems for creating andaccessing databases describing those attributes and for performinganalyses based on those attributes. The methods, computer systems andsoftware are useful for identifying intricate combinations of attributesthat predispose human beings toward having or developing specificdisorders, behaviors and traits of interest, determining the level ofpredisposition of an individual towards such attributes, and revealingwhich attribute associations can be added or eliminated to effectivelymodify what may have been hereto believed to be destiny. The methods,computer systems and software are also applicable for tissues andnon-human organisms, as well as for identifying combinations ofattributes that correlate with or cause behaviors and outcomes incomplex non-living systems including molecules, electrical andmechanical systems and various devices and apparatus whose functionalityis dependent on a multitude of attributes.

Previous methods have been largely unsuccessful in determining thecomplex combinations of attributes that predispose individuals to mostdisorders, behaviors and traits. The level of resolution afforded by thedata typically used is too low, the number and types of attributesconsidered is too limited, and the sensitivity to detect low frequency,high complexity combinations is lacking. The desirability of being ableto determine the complex combinations of attributes that predispose anindividual to physical or behavioral disorders has clear implicationsfor improving individualized diagnoses, choosing the most effectivetherapeutic regimens, making beneficial lifestyle changes that preventdisease and promote health, and reducing associated health careexpenditures. It is also desirable to determine those combinations ofattributes that promote certain behaviors and traits such as success insports, music, school, leadership, career and relationships.

Advances in technology within the field of genetics now provide theability to achieve maximum resolution of the entire genome. Discoveryand characterization of epigenetic modifications—reversible chemicalmodifications of DNA and structural modification of chromatin thatdramatically alter gene expression—has provided an additional level ofinformation that may be altered due to environmental conditions, lifeexperiences and aging. Along with a collection of diverse nongeneticattributes including physical, behavioral, situational and historicalattributes associated with an organism, the present invention providesthe ability to utilize the above information to enable prediction of thepredisposition of an organism toward developing a specific attribute ofinterest provided in a query.

There are approximately 25,000 genes in the human genome. Of these,approximately 1,000 of these genes are involved in monogenic disorders,which are disorders whose sole cause is due to the properties of asingle gene. This collection of disorders represents less than twopercent of all human disorders. The remaining 98 percent of humandisorders, termed complex disorders, are caused by multiple geneticinfluences or a combination of multiple genetic and non-geneticinfluences, still yet to be determined due to their resistance tocurrent methods of discovery.

Previous methods using genetic information have suffered from either alack of high resolution information, very limited coverage of totalgenomic information, or both. Genetic markers such as single nucleotidepolymorphisms (SNPs) do not provide a complete picture of a gene'snucleotide sequence or the total genetic variability of the individual.The SNPs typically used occur at a frequency of at least 5% in thepopulation. However, the majority of genetic variation that exists inthe population occurs at frequencies below 1%. Furthermore, SNPs arespaced hundreds of nucleotides apart and do not account for geneticvariation that occurs in the genetic sequence lying between, which isvastly more sequence than the single nucleotide position represented byan SNP. SNPs are typically located within gene coding regions and do notallow consideration of 98% of the 3 billion base pairs of genetic codein the human genome that does not encode gene sequences. Other markerssuch as STS, gene locus markers and chromosome loci markers also providevery low resolution and incomplete coverage of the genome. Complete andpartial sequencing of an individual's genome provides the ability toincorporate that detailed information into the analysis of factorscontributing toward expressed attributes.

Genomic influence on traits is now known to involve more than just theDNA nucleotide sequence of the genome. Regulation of expression of thegenome can be influenced significantly by epigenetic modification of thegenomic DNA and chromatin (3-dimensional genomic DNA with boundproteins). Termed the epigenome, this additional level of informationcan make genes in an individual's genome behave as if they were absent.Epigenetic modification can dramatically affect the expression ofapproximately at least 6% of all genes.

Epigenetic modification silences the activity of gene regulatory regionsrequired to permit gene expression. Genes can undergo epigeneticsilencing as a result of methylation of cytosines occurring in CpGdinucleotide motifs, and to a lesser extent by deacetylation ofchromatin-associated histone proteins which inhibit gene expression bycreating 3-dimensional conformational changes in chromatin. Assays suchas bisulfite sequencing, differential methyl hybridization usingmicroarrays, methylation sensitive polymerase chain reaction, and massspectrometry enable the detection of cytosine nucleotide methylationwhile chromosome immunoprecipitation (CHIP) can be used to detecthistone acetylation states of chromatin.

In one embodiment, epigenetic attributes are incorporated in the presentinvention to provide certain functionality. First, major mentaldisorders such as schizophrenia and bipolar mood disorder are thought tobe caused by or at least greatly influenced by epigenetic imprinting ofgenes. Second, all epigenetic modification characterized to date isreversible in nature, allowing for the potential therapeuticmanipulation of the epigenome to alter the course and occurrence ofdisease and certain behaviors. Third, because epigenetic modification ofthe genome occurs in response to experiences and stimuli encounteredduring prenatal and postnatal life, epigenetic data can help fill gapsresulting from unobtainable personal data, and reinforce or evensubstitute for unreliable self-reported data such as life experiencesand environmental exposures.

In addition to genetic and epigenetic attributes, which can be referredto collectively as pangenetic attributes, numerous other attributeslikely influence the development of traits and disorders. These otherattributes, which can be referred to collectively as non-pangeneticattributes, can be categorized individually as physical, behavioral, orsituational attributes. FIG. 1 displays one embodiment of the attributecategories and their interrelationships according to the presentinvention and illustrates that physical and behavioral attributes can becollectively equivalent to the broadest classical definition ofphenotype, while situational attributes can be equivalent to thosetypically classified as environmental. In one embodiment, historicalattributes can be viewed as a separate category containing a mixture ofgenetic, epigenetic, physical, behavioral and situational attributesthat occurred in the past. Alternatively, historical attributes can beintegrated within the genetic, epigenetic, physical, behavioral andsituational categories provided they are made readily distinguishablefrom those attributes that describe the individual's current state. Inone embodiment, the historical nature of an attribute is accounted forvia a time stamp or other time based marker associated with theattribute. As such, there are no explicit historical attributes, butthrough use of time stamping, the time associated with the attribute canbe used to make a determination as to whether the attribute is occurringin what would be considered the present, or if it has occurred in thepast. Traditional demographic factors are typically a small subset ofattributes derived from the phenotype and environmental categories andcan be therefore represented within the physical, behavioral andsituational categories.

In the present invention the term ‘attributes’ rather than the term‘factors’ is used since many of the entities are characteristicsassociated with an individual that may have no influence on the vastmajority of their traits, behaviors and disorders. As such, there may bemany instances during execution of the methods disclosed herein when aparticular attribute does not act as a factor in determiningpredisposition. Nonetheless, every attribute remains a potentiallyimportant characteristic of the individual and may contribute topredisposition toward some other attribute or subset of attributesqueried during subsequent or future implementation of the methodsdisclosed herein. In the present invention, the term ‘bioattribute’ canbe used to refer to any attribute associated with a biological entity,such as an attribute associated with an organism or an attributeassociated with a biologic molecule, for example. Therefore even anumerical address ZIP code, which is not a biological entity, can be abioattribute when used to describe the residential location associatedwith a biological entity such as a person.

An individual possesses many associated attributes which may becollectively referred to as an ‘attribute profile’ associated with thatindividual. In one embodiment, an attribute profile can be considered asbeing comprised of the attributes that are present (i.e., occur) in thatprofile, as well as being comprised of the various combinations (i.e.,combinations and subcombinations) of those attributes. The attributeprofile of an individual is preferably provided to embodiments of thepresent invention as a dataset record whose association with theindividual can be indicated by a unique identifier contained in thedataset record. An actual attribute of an individual can be representedby an attribute descriptor in attribute profiles, records, datasets, anddatabases. Herein, both actual attributes and attribute descriptors maybe referred to simply as attributes. In one embodiment, statisticalrelationships and associations between attribute descriptors are adirect result of relationships and associations between actualattributes of an individual. In the present disclosure, the term‘individual’ can refer to a singular group, person, organism, organ,tissue, cell, virus, molecule, thing, entity or state, wherein a stateincludes but is not limited to a state-of-being, an operational state ora status. Individuals, attribute profiles and attributes can be realand/or measurable, or they may be hypothetical and/or not directlyobservable.

In one embodiment the present invention can be used to discovercombinations of attributes regardless of number or type, in a populationof any size, that cause predisposition to an attribute of interest. Indoing so, this embodiment also has the ability to provide a list ofattributes one can add or subtract from an existing profile ofattributes in order to respectively increase or decrease the strength ofpredisposition toward the attribute of interest. The ability toaccurately detect predisposing attribute combinations naturally benefitsfrom being supplied with datasets representing large numbers ofindividuals and having a large number and variety of attributes foreach. Nevertheless, the present invention will function properly with aminimal number of individuals and attributes. One embodiment of thepresent invention can be used to detect not only attributes that have adirect (causal) effect on an attribute of interest, but also thoseattributes that do not have a direct effect such as instrumentalvariables (i.e., correlative attributes), which are attributes thatcorrelate with and can be used to predict predisposition for theattribute of interest but are not causal. For simplicity of terminology,both types of attributes are referred to herein as predisposingattributes, or simply attributes, that contribute toward predispositiontoward the attribute of interest, regardless of whether the contributionor correlation is direct or indirect.

It is beneficial, but not necessary, in most instances, that theindividuals whose data is supplied for the method be representative ofthe individual or population of individuals for which the predictionsare desired. In a preferred embodiment, the attribute categoriescollectively encompass all potential attributes of an individual. Eachattribute of an individual can be appropriately placed in one or moreattribute categories of the methods, system and software of theinvention. Attributes and the various categories of attributes can bedefined as follows:

-   -   a) attribute: a quality, trait, characteristic, relationship,        property, factor or object associated with or possessed by an        individual;    -   b) genetic attribute: any genome, genotype, haplotype,        chromatin, chromosome, chromosome locus, chromosomal material,        deoxyribonucleic acid (DNA), allele, gene, gene cluster, gene        locus, gene polymorphism, gene mutation, gene marker,        nucleotide, single nucleotide polymorphism (SNP), restriction        fragment length polymorphism (RFLP), variable tandem repeat        (VTR), genetic marker, sequence marker, sequence tagged site        (STS), plasmid, transcription unit, transcription product,        ribonucleic acid (RNA), and copy DNA (cDNA), including the        nucleotide sequence and encoded amino acid sequence of any of        the above;    -   c) epigenetic attribute: any feature of the genetic material—all        genomic, vector and plasmid DNA, and chromatin—that affects gene        expression in a manner that is heritable during somatic cell        divisions and sometimes heritable in germline transmission, but        that is nonmutational to the DNA sequence and is therefore        fundamentally reversible, including but not limited to        methylation of DNA nucleotides and acetylation of        chromatin-associated histone proteins;    -   d) pangenetic attribute: any genetic or epigenetic attribute;    -   e) physical attribute: any material quality, trait,        characteristic, property or factor of an individual present at        the atomic, molecular, cellular, tissue, organ or organism        level, excluding genetic and epigenetic attributes;    -   f) behavioral attribute: any singular, periodic, or aperiodic        response, action or habit of an individual to internal or        external stimuli, including but not limited to an action,        reflex, emotion or psychological state that is controlled or        created by the nervous system on either a conscious or        subconscious level;    -   g) situational attribute: any object, condition, influence, or        milieu that surrounds, impacts or contacts an individual; and    -   h) historical attribute: any genetic, epigenetic, physical,        behavioral or situational attribute that was associated with or        possessed by an individual in the past. As such, the historical        attribute refers to a past state of the individual and may no        longer describe the current state.

The methods, systems, software, and databases disclosed herein apply toand are suitable for use with not only humans, but for other organismsas well. The methods, systems, software and databases may also be usedfor applications that consider attribute identification, predispositionpotential and destiny modification for organs, tissues, individualcells, and viruses both in vitro and in vivo. For example, the methodscan be applied to behavior modification of individual cells being grownand studied in a laboratory incubator by providing pangenetic attributesof the cells, physical attributes of the cells such as size, shape andsurface receptor densities, and situational attributes of the cells suchas levels of oxygen and carbon dioxide in the incubator, temperature ofthe incubator, and levels of glucose and other nutrients in the liquidgrowth medium. Using these and other attributes, the methods, systems,software and databases can then be used to predict predisposition of thecells for such characteristics as susceptibility to infection byviruses, general growth rate, morphology, and differentiation potential.The methods, systems, software, and databases disclosed herein can alsobe applied to complex non-living systems to, for example, predict thebehavior of molecules or the performance of electrical devices ormachinery subject to a large number of variables.

FIG. 2 illustrates system components corresponding to one embodiment ofa method, system, software, and databases for compiling predisposingattribute combinations. Attributes can be stored in the various datasetsof the system. In one embodiment, 1st dataset 200 is a raw dataset ofattributes that may be converted and expanded by conversion/formattingengine 220 into a more versatile format and stored in expanded 1stdataset 202. Comparison engine 222 can perform a comparison betweenattributes from records of the 1st dataset 200 or expanded 1st dataset202 to determine candidate predisposing attributes which are then storedin 2nd dataset 204. Comparison engine 222 can tabulate a list of allpossible combinations of the candidate attributes and then perform acomparison of those combinations with attributes contained withinindividual records of 1st dataset 200 or expanded 1st dataset 202.Comparison engine 222 can store those combinations that are found tooccur and meet certain selection criteria in 3rd dataset 206 along witha numerical frequency of occurrence obtained as a count during thecomparison. Statistical computation engine 224 can perform statisticalcomputations using the numerical frequencies of occurrence to obtainresults (values) for strength of association between attributes andattribute combinations and then store those results in 3rd dataset 206.Statistical computation engine 224, alone or in conjunction withcomparison engine 222, can create a 4th dataset 208 containingattributes and attribute combinations that meet a minimum or maximumstatistical requirement by applying a numerical or statistical filter tothe numerical frequencies of occurrence or the values for strength ofassociation stored in 3rd dataset 206. Although represented as a systemand engines, the system and engines can be considered subsystems of alarger system, and as such referred to as subsystems. Such subsystemsmay be implemented as sections of code, objects, or classes of objectswithin a single system, or may be separate hardware and softwareplatforms which are integrated with other subsystems to form the finalsystem.

FIGS. 3A and 3B show a representative form for genetic attributes as DNAnucleotide sequence with each nucleotide position associated with anumerical identifier. In this form, each nucleotide is treated as anindividual genetic attribute, thus providing maximum resolution of thegenomic information of an individual. FIG. 3A depicts a portion of theknown gene sequence for the HTR2A gene for two individuals having anucleotide difference at nucleotide sequence position number 102.Comparing known genes simplifies the task of properly phasing nucleotidesequence comparisons. However, for comparison of non-gene sequences, dueto the presence of insertions and deletions of varying size in thegenome of one individual versus another, markers such as STS sequencescan be used to allow for a proper in-phase comparison of the DNAsequences between different individuals. FIG. 3B shows genomic DNAplus-strand sequence for two individuals beginning at the STS#68777forward primer which provides a known location of the sequence withinthe genome and facilitates phasing of the sequence with other sequencesfrom that region of the genome during sequence comparison.

Conversion/formatting engine 220 of FIG. 2 can be used in conjunctionwith comparison engine 222 to locate and number the STS marker positionswithin the sequence data and store the resulting data in expanded 1stdataset 202. In one embodiment, comparison engine 222 has the ability torecognize strings of nucleotides with a word size large enough to enableaccurately phased comparison of individual nucleotides in the spanbetween marker positions. This function is also valuable in comparingknown gene sequences. Nucleotide sequence comparisons in the presentinvention can also involve transcribed sequences in the form of mRNA,tRNA, rRNA, and cDNA sequences which all derive from genomic DNAsequence and are handled in the same manner as nucleotide sequences ofknown genes.

FIGS. 3C and 3D show two other examples of genetic attributes that maybe compared in one embodiment of the present invention and the formatthey may take. Although not preferred because of the relatively smallamount of information provided, SNP polymorphisms (FIG. 3C) and alleleidentity (FIG. 3D) can be processed by one or more of the methods hereinto provide a limited comparison of the genetic content of individuals.

FIGS. 4A and 4B show examples of epigenetic data that can be compared,the preferred epigenetic attributes being methylation site data. FIG. 4Arepresents a format of methylation data for hypothetical Gene X for twoindividuals, where each methylation site (methylation variable position)is distinguishable by a unique alphanumeric identifier. The identifiermay be further associated with a specific gene, site or chromosomallocus of the genome. In this embodiment, the methylation status at eachsite is an attribute that can have either of two values: methylated (M)or unmethylated (U). Other epigenetic data and representations ofepigenetic data can be used to perform the methods disclosed herein, andto construct the systems, software and databases disclosed herein, aswill be understood by one skilled in the art.

As shown in FIG. 4B, an alternative way to organize epigeneticmethylation data is to append it directly to the corresponding geneticsequence attribute dataset as methylation status at each candidate CpGdinucleotide occurring in that genomic nucleotide sequence, in thisexample for hypothetical Gene Z for two individuals. The advantage ofthis format is that it inherently includes chromosome, gene andnucleotide position information. In this format, which is the mostcomplete and informative format for the raw data, the epigenetic datacan be extracted and converted to another format at any time. Bothformats (that of FIG. 4A as well as that of FIG. 4B) provide the sameresolution of methylation data, but it is preferable to adhere to oneformat in order to facilitate comparison of epigenetic data betweendifferent individuals. Regarding either data format, in instances wherean individual is completely lacking a methylation site due to a deletionor mutation of the corresponding CpG dinucleotide, the correspondingepigenetic attribute value should be omitted (i.e., assigned a null).

FIG. 5 illustrates representative classes of physical attributes asdefined by physical attributes metaclass 500, which can include physicalhealth class 510, basic physical class 520, and detailed physical class530, for example. In one embodiment physical health class 510 includes aphysical diagnoses subclass 510.1 that includes the following specificattributes (objects), which when positive indicate a known physicaldiagnoses:

510.1.1 Diabetes

510.1.2 Heart Disease

510.1.3 Osteoporosis

510.1.4 Stroke

510.1.5 Cancer

-   -   510.1.5.1 Prostrate Cancer    -   510.1.5.2 Breast Cancer    -   510.1.5.3 Lung Cancer    -   510.1.5.4 Colon Cancer    -   510.1.5.5 Bladder Cancer    -   510.1.5.6 Endometrial Cancer    -   510.1.5.7 Non-Hodgkin's Lymphoma    -   510.1.5.8 Ovarian Cancer    -   510.1.5.9 Kidney Cancer    -   510.1.5.10 Leukemia    -   510.1.5.11 Cervical Cancer    -   510.1.5.12 Pancreatic Cancer    -   510.1.5.13 Skin melanoma    -   510.1.5.14 Stomach Cancer

510.1.6 Bronchitis

510.1.7 Asthma

510.1.8 Emphysema

The above classes and attributes represent the current condition of theindividual. In the event that the individual (e.g. consumer 810) had adiagnosis for an ailment in the past, the same classificationmethodology can be applied, but with an “h” placed after the attributenumber to denote a historical attribute. For example, 510.1.4h can beused to create an attribute to indicate that the individual suffered astroke in the past, as opposed to 510.1.4 which indicates the individualis currently suffering a stroke or the immediate aftereffects. Usingthis approach, historical classes and attributes mirroring the currentclasses and attributes can be created, as illustrated by historicalphysical health class 510 h, historical physical diagnoses class 510.1h, historical basic physical class 520 h, historical height class 520.1h, historical detailed physical class 530 h, and historical hormonelevels class 530.1 h. In an alternate embodiment historical classes andhistorical attributes are not utilized. Rather, time stamping of thediagnoses or event is used. In this approach, an attribute of510.1.4-05FEB03 would indicate that the individual suffered a stroke onFeb. 5, 2003. Alternate classification schemes and attributeclasses/classifications can be used and will be understood by one ofskill in the art. In one embodiment, time stamping of attributes ispreferred in order to permit accurate determination of those attributesor attribute combinations that are associated with an attribute ofinterest (i.e., a query attribute or target attribute) in a causative orpredictive relationship, or alternatively, those attributes or attributecombinations that are associated with an attribute of interest in aconsequential or symptomatic relationship. In one embodiment, onlyattributes bearing a time stamp that predates the time stamp of theattribute of interest are processed by the methods. In anotherembodiment, only attributes bearing a time stamp that postdates the timestamp of the attribute of interest are processed by the methods. Inanother embodiment, both attributes that predate and attributes thatpostdate an attribute of interest are processed by the methods.

As further shown in FIG. 5, physical prognoses subclass 510.2 cancontain attributes related to clinical forecasting of the course andoutcome of disease and chances for recovery. Basic physical class 520can include the attributes age 520.1, sex 520.2, height 520.3, weight520.4, and ethnicity 520.5, whose values provide basic physicalinformation about the individual. Hormone levels 530.1 andstrength/endurance 530.4 are examples of attribute subclasses withindetailed physical class 530. Hormone levels 530.1 can include attributesfor testosterone level, estrogen level, progesterone level, thyroidhormone level, insulin level, pituitary hormone level, and growthhormone level, for example. Strength/endurance 530.4 can includeattributes for various weight lifting capabilities, stamina, runningdistance and times, and heart rates under various types of physicalstress, for example. Blood sugar level 530.2, blood pressure 530.3 andbody mass index 530.5 are examples of attributes whose values providedetailed physical information about the individual. Historical physicalhealth class 510 h, historical basic physical class 520 h and historicaldetailed physical class 530 h are examples of historical attributeclasses. Historical physical health class 510 h can include historicalattribute subclasses such as historical physical diagnoses class 510.hwhich would include attributes for past physical diagnoses of variousdiseases and physical health conditions which may or may not berepresentative of the individual's current health state. Historicalbasic physical class 520 h can include attributes such as historicalheight class 520.1 h which can contain heights measured at particularages. Historical detailed physical class 530 h can include attributesand attribute classes such as the historical hormone levels class 530.1h which would include attributes for various hormone levels measured atvarious time points in the past.

In one embodiment, the classes and indexing illustrated in FIG. 5 anddisclosed above can be matched to health insurance information such ashealth insurance codes, such that information collected by health careprofessionals (such as clinician 820 of FIG. 8, which can be aphysician, nurse, nurse practitioner or other health care professional)can be directly incorporated as attribute data. In this embodiment, theheath insurance database can directly form part of the attributedatabase, such as one which can be constructed using the classes of FIG.5.

FIG. 6 illustrates classes of situational attributes as defined bysituational attributes metaclass 600, which in one embodiment caninclude medical class 610, exposures class 620, and financial class 630,for example. In one embodiment, medical class 610 can include treatmentssubclass 610.1 and medications subclass 610.2; exposures class 620 caninclude environmental exposures subclass 620.1, occupational exposuressubclass 620.2 and self-produced exposures 620.3; and financial class630 can include assets subclass 630.1, debt subclass 630.2 and creditreport subclass 630.3. Historical medical class 610 h can includehistorical treatments subclass 610.1 h, historical medications subclass610.2 h, historical hospitalizations subclass 610.3 h and historicalsurgeries subclass 610.4 h. Other historical classes included within thesituational attributes metaclass 600 can be historical exposuressubclass 620 h, historical financial subclass 630 h, historical incomehistory subclass 640 h, historical employment history subclass 650 h,historical marriage/partnerships subclass 660 h, and historicaleducation subclass 670 h.

In one embodiment, commercial databases such as credit databases,databases containing purchase information (e.g. frequent shopperinformation) can be used as either the basis for extracting attributesfor the classes such as those in financial subclass 630 and historicalfinancial subclass 630 h, or for direct mapping of the information inthose databases to situational attributes. Similarly, accountinginformation such as that maintained by the consumer 810 of FIG. 8, or arepresentative of the consumer (e.g. the consumer's accountant) can alsobe incorporated, transformed, or mapped into the classes of attributesshown in FIG. 6.

Measurement of financial attributes such as those illustrated anddescribed with respect to FIG. 6 allows financial attributes such asassets, debt, credit rating, income and historical income to be utilizedin the methods, systems, software and databases described herein. Insome instances, such financial attributes can be important with respectto a query attribute. Similarly, other situational attributes such asthe number of marriages/partnerships, length of marriages/partnership,number jobs held, income history, can be important attributes and willbe found to be related to certain query attributes. In one embodiment asignificant number of attributes described in FIG. 6 are extracted frompublic or private databases, either directly or through manipulation,interpolation, or calculations based on the data in those databases.

FIG. 7 illustrates classes of behavioral attributes as defined bybehavioral attributes metaclass 700, which in one embodiment can includemental health class 710, habits class 720, time usage class 730,mood/emotional state class 740, and intelligence quotient class 750, forexample. In one embodiment, mental health class 710 can includemental/behavioral diagnoses subclass 710.1 and mental/behavioralprognoses subclass 710.2; habits class 720 can include diet subclass720.1, exercise subclass 720.2, alcohol consumption subclass 720.3,substances usage subclass 720.4, and sexual activity subclass 720.5; andtime usage class 730 can include work subclass 730.1, commute subclass730.2, television subclass 730.3, exercise subclass 730.4 and sleepsubclass 730.5. Behavioral attributes metaclass 700 can also includehistorical classes such as historical mental health class 710 h,historical habits 720 h, and historical time usage class 730 h.

As discussed with respect to FIGS. 5 and 6, in one embodiment, externaldatabases such as health care provider databases, purchase records andcredit histories, and time tracking systems can be used to supply thedata which constitutes the attributes of FIG. 7. Also with respect toFIG. 7, classification systems such as those used by mental healthprofessionals such as classifications found in the DSM-IV can be useddirectly, such that the attributes of mental health class 710 andhistorical prior mental health class 710 h have a direct correspondenceto the DSM-IV. The classes and objects of the present invention, asdescribed with respect to FIGS. 5, 6 and 7, can be implemented using anumber of database architectures including, but not limited to flatfiles, relational databases and object oriented databases.

Unified Modeling Language (“UML”) can be used to model and/or describemethods and systems and provide the basis for better understanding theirfunctionality and internal operation as well as describing interfaceswith external components, systems and people using standardizednotation. When used herein, UML diagrams including, but not limited to,use case diagrams, class diagrams and activity diagrams, are meant toserve as an aid in describing the embodiments of the present inventionbut do not constrain implementation thereof to any particular hardwareor software embodiments. Unless otherwise noted, the notation used withrespect to the UML diagrams contained herein is consistent with the UML2.0 specification or variants thereof and is understood by those skilledin the art.

FIG. 8 illustrates a use case diagram for an attribute determinationsystem 800 which, in one embodiment, allows for the determination ofattributes which are statistically relevant or related to a queryattribute. Attribute determination system 800 allows for a consumer 810,clinician 820, and genetic database administrator 830 to interact,although the multiple roles may be filled by a single individual, toinput attributes and query the system regarding which attributes arerelevant to the specified query attribute. In a contribute geneticsample use case 840 a consumer 810 contributes a genetic sample.

In one embodiment this involves the contribution by consumer 810 of aswab of the inside of the cheek, a blood sample, or contribution ofother biological specimen associated with consumer 810 from whichgenetic and epigenetic data can be obtained. In one embodiment, geneticdatabase administrator 830 causes the genetic sample to be analyzedthrough a determine genetic and epigenetic attributes use case 850.Consumer 810 or clinician 820 may collect physical attributes through adescribe physical attributes use case 842. Similarly, behavioral,situational, and historical attributes are collected from consumer 810or clinician 820 via describe behavioral attributes use case 844,describe situational attributes use case 846, and describe historicalattributes use case 848, respectively. Clinician 820 or consumer 810 canthen enter a query attribute through receive query attribute use case852. Attribute determination system 800 then, based on attributes oflarge query-attribute-positive and query-attribute-negative populations,determines which attributes and combinations of attributes, extendingacross the pangenetic (genetic/epigenetic), physical, behavioral,situational, and historical attribute categories, are statisticallyrelated to the query attribute. As previously discussed, and withrespect to FIG. 1 and FIGS. 4-6, historical attributes can, in certainembodiments, be accounted for through the other categories ofattributes. In this embodiment, describe historical attributes use case848 is effectively accomplished through determine genetic and epigeneticattributes use case 850, describe physical attributes use case 842,describe behavioral attributes use case 844, and describe situationalattributes use case 846.

With respect to the aforementioned method of collection, inaccuraciescan occur, sometimes due to outright misrepresentations of theindividual's habits. For example, it is not uncommon for patients toself-report alcohol consumption levels which are significantly belowactual levels. This can occur even when a clinician/physician isinvolved, as the patient reports consumption levels to theclinician/physician that are significantly below their actualconsumption levels. Similarly, it is not uncommon for an individual toover-report the amount of exercise they get.

In one embodiment, disparate sources of data including consumption dataas derived from purchase records, data from blood and urine tests, andother observed characteristics are used to derive attributes such asthose shown in FIGS. 5-7. By analyzing sets of disparate data,corrections to self-reported data can be made to produce more accuratedeterminations of relevant attributes. In one embodiment, heuristicrules are used to generate attribute data based on measured, rather thanself-reported attributes. Heuristic rules are defined as rules whichrelate measurable (or accurately measurable) attributes to lessmeasurable or less reliable attributes such as those from self-reporteddata. For example, an individual's recorded purchases includingcigarette purchases can be combined with urine analysis or blood testresults which measure nicotine levels or another tobacco relatedparameter and heuristic rules can be applied to estimate cigaretteconsumption level. As such, one or more heuristic rules, typically basedon research which statistically links a variety of parameters, can beapplied by data conversion/formatting engine 220 to the datarepresenting the number of packs of cigarettes purchased by anindividual or household, results of urine or blood tests, and otherstudied attributes, to derive an estimate of the extent to which theindividual smokes.

In one embodiment, the heuristic rules take into account attributes suchas household size and self-reported data to assist in the derivation ofthe desired attribute. For example, if purchase data is used in aheuristic rule, household size and even the number of self-reportedsmokers in the household, can be used to help determine actual levels ofconsumption of tobacco by the individual. In one embodiment, householdmembers are tracked individually, and the heuristic rules provide forthe ability to approximately assign consumption levels to differentpeople in the household. Details such as individual brand usages orpreferences may be used to help assign consumptions within thehousehold. As such, in one embodiment the heuristic rules can be appliedby data conversion/formatting engine 220 to a number of disparate piecesof data to assist in extracting one or more attributes.

Physical, behavioral, situational and historical attribute data may bestored or processed in a manner that allows retention of maximumresolution and accuracy of the data while also allowing flexiblecomparison of the data so that important shared similarities betweenindividuals are not overlooked. This can be important when processingnarrow and extreme attribute values, or when using smaller populationsof individuals where the reduced number of individuals makes theoccurrence of identical matches of attributes rare. In these and othercircumstances, flexible treatment and comparison of attributes canreveal predisposing attributes that are related to or legitimatelyderive from the original attribute values but have broader scope, lowerresolution, and extended or compounded values compared to the originalattributes. In one embodiment, attributes and attribute values can bequalitative (categorical) or quantitative (numerical). In anotherembodiment, attributes and attribute values can be discrete orcontinuous numerical values.

There are several ways flexible treatment and comparison of attributescan be accomplished. As shown in FIG. 2, one approach is to incorporatedata conversion/formatting engine 220 which is able to create expanded1st dataset 202 from 1st dataset 200. In one embodiment, 1st dataset 200can comprise one or more primary attributes, or original attributeprofiles containing primary attributes, and expanded 1st dataset 202 cancomprise one or more secondary attributes, or expanded attributeprofiles containing secondary attributes. A second approach is toincorporate functions into attribute comparison engine 222 that allow itto expand the original attribute data into additional values or rangesduring the comparison process. This provides the functional equivalentof reformatting the original dataset without having to create and storethe entire set of expanded attribute values.

In one embodiment, original attributes (primary attributes) can beexpanded into one or more sets containing derived attributes (secondaryattributes) having values, levels or degrees that are above, below,surrounding or including that of the original attributes. In oneembodiment, original attributes can be used to derive attributes thatare broader or narrower in scope than the original attributes. In oneembodiment, two or more original attributes can be used in a computation(i.e., compounded) to derive one or more attributes that are related tothe original attributes. As shown in FIG. 9A, a historical situationalattribute indicating a time span of smoking, from age 25-27, and ahistorical behavioral attribute indicating a smoking habit, 10 packs perweek, may be compounded to form a single value for the historicalsituational attribute of total smoking exposure to date, 1560 packs, asshown in FIG. 9B, by simply multiplying 156 weeks by 10 packs/week.Similar calculations enable the derivation of historical situationalattributes such as total nicotine and total cigarette tar exposure basedon known levels nicotine and tar in the specific brand smoked, Marlboroas indicated by the cigarette brand attribute, multiplied by the totalsmoking exposure to date. In another example, a continuous numericalattribute, {time=5.213 seconds}, can be expanded to derive the discretenumerical attribute, {time=5 seconds}.

Attribute expansion of a discrete numerical attribute, such as age, canbe exemplified in one embodiment using a population comprised of fourindividuals ages 80, 66, 30 and 15. In this example, Alzheimer's diseaseis the query attribute, and both the 80 year old and the 66 year oldindividual have Alzheimer's disease, as indicated by an attribute for apositive Alzheimer's diagnosis in their attribute profiles. Therefore,for this small population, the 80 and 66 year old individuals constitutethe query-attribute-positive group (the group associated with the queryattribute). If a method of discovering attribute associations isexecuted, none of the attribute combinations identified as beingstatistically associated with the query attribute will include age,since the numerical age attributes 80 and 66 are not identical. However,it is already known from empirical scientific research that Alzheimer'sdisease is an age-associated disease, with prevalence of the diseasebeing much higher in the elderly. By using the original (primary) ageattributes to derive new (secondary) age attributes, a method ofdiscovering attribute associations can appropriately identify attributecombinations that contain age as a predisposing attribute forAlzheimer's disease based on the query-attribute-positive group of thispopulation. To accomplish this, a procedure of attribute expansionderives lower resolution secondary age attributes from the primary ageattributes and consequently expands the attribute profiles of theindividuals in this population. This can be achieved by eithercategorical expansion or numerical expansion.

In one embodiment of a categorical attribute expansion, primarynumerical age attributes are used to derive secondary categoricalattributes selected from the following list: infant (ages 0-1), toddler(ages 1-3), child (ages 4-8), preadolescent (ages 9-12), adolescent(ages 13-19), young adult (ages 20-34), mid adult (ages 35-49), lateadult (ages 50-64), and senior (ages 65 and up). This particularattribute expansion will derive the attribute ‘senior’ for the 80 yearold individual, ‘senior’ for the 66 year old, ‘young adult’ for the 30year old, and ‘adolescent’ for the 15 year old. These derived attributescan be added to the respective attribute profiles of these individualsto create an expanded attribute profile for each individual. As aconsequence of this attribute expansion procedure, the 80 and 66 yearold individuals will both have expanded attribute profiles containing anidentical age attribute of ‘senior’, which will be then be identified inattribute combinations that are statistically associated with the queryattribute of Alzheimer's disease, based on a higher frequency ofoccurrence of this attribute in the query-attribute-positive group forthis example.

As an alternative to the above categorical expansion, a numericalattribute expansion can be performed in which numerical age is used toderive a set of secondary numerical attributes comprising a sequence ofinequality statements containing progressively larger numerical valuesthan the actual age and a set of secondary attributes comprising asequence of inequality statements containing progressively smallerquantitative values than the actual age. For example, attributeexpansion can produce the following two sets of secondary age attributesfor the 80 year old: {110>age, 109>age . . . , 82>age, 81>age} and{age>79, age>78 . . . , age>68, age>67, age>66, age>65, age>64 . . . ,age>1, age>0}. And attribute expansion can produce the following twosets of secondary age attributes for the 66 year old: {110>age, 109>age. . . , 82>age, 81>age, 80>age, 79>age, 78>age . . . , 68>age, 67>age}and {age>65, age>64 . . . , age>1, age>0}.

Identical matches of age attributes found in the largest attributecombination associated with Alzheimer's disease, based on the 80 and 66year old individuals that have Alzheimer's in this sample population,would contain both of the following sets of age attributes: {110>age,109>age . . . , 82>age, 81>age} and {age>65, age>64 . . . , age>1,age>0}. This result indicates that being less than 81 years of age butgreater than 65 years of age (i.e., having an age in the range:81>age>65) is a predisposing attribute for having Alzheimer's disease inthis population. This particular method of attribute expansion of ageinto a numerical sequence of inequality statements provides identicalmatches between at least some of the age attributes between individuals,and provides an intermediate level of resolution between actual age andthe broader categorical age attribute of ‘senior’ derived in the firstexample above.

Expansion of age attributes can be also be used for instances in whichage is used to designate a point in life at which a specific activity orbehavior occurred. For example, FIG. 9 demonstrates an example in whichthe actual ages of exposure to smoking cigarettes, ages 25-27, areexpanded into a low resolution categorical age attribute of ‘adult’, abroader numerical age range of ‘21-30’, and a set of age attributescomprising a sequence of progressively larger numerical inequalitystatements for age of the individual, {age>24, age>23 . . . , age>2,age>1}.

Attribute expansion can also be used to reduce the amount of geneticinformation to be processed by the methods of the present invention,essentially 3 billion nucleotides of information per individual andnumerous combinations comprised thereof. For example, attributeexpansion can be used to derive a set of lower resolution geneticattributes (e.g., categorical genetic attributes such as names) that canbe used instead of the whole genomic sequence in the methods.Categorical genetic attributes can be assigned based on only one or afew specific nucleotide attributes out of hundreds or thousands in asequence segment (e.g., a gene, or a DNA or RNA sequence read). However,using only lower resolution categorical genetic attributes may cause thesame inherent limitations of sensitivity as using only SNPs and genomicmarkers, which represent only a portion of the full genomic sequencecontent. So, while categorical genetic attributes can be used to greatlydecrease processing times required for execution of the methods, theyextract a cost in terms of loss of information when used in place of thefull high resolution genomic sequence, and the consequence of this canbe the failure to identify certain predisposing genetic variationsduring execution of the methods. In one embodiment, this can show upstatistically in the form of attribute combinations having lowerstrengths of association with query attributes and/or an inability toidentify any attribute combination having an absolute risk of 1.0 forassociation with a query attribute. So the use of descriptive geneticattributes would be most suitable, and accuracy and sensitivity themethods increased, once the vast majority of influential geneticvariations in the genome (both in gene encoding regions and non-codingregions) have been identified and can be incorporated into rules forassigning categorical genetic attributes.

Instead of being appended to the whole genome sequence attribute profileof an individual, categorical genetic attributes can be used to create aseparate genetic attribute profile for the individual that comprisesthousands of genetic descriptors, rather than billions of nucleotidedescriptors. As an example, 19 different nucleotide mutations have beenidentified in the Cystic Fibrosis Conductance Regulator Gene, each ofwhich can disrupt function of the gene's encoded protein productresulting in clinical diagnosis of cystic fibrosis disease. Since thisis the major known disease associated with this gene, the presence ofany of the 19 mutations can be the basis for deriving a single lowerresolution attribute of ‘CFCR gene with cystic fibrosis mutation’ with astatus value of {1=Yes} to represent possession of the genomic sequenceof one of the diseased variations of this gene, with the remainingsequence of the gene ignored. For individuals that do not possess any ofthe 19 mutations in their copies of the gene, the attribute ‘CFCR genewith cystic fibrosis mutation’ and a status value {0=No} can be derived.This approach not only reduces the amount of genetic information thatneeds to be processed, it allows for creation of an identical geneticattribute associated with 19 different individuals, each possessing oneof 19 different nucleotide mutations in the Cystic Fibrosis ConductanceRegulator Gene, but all having the same gene mutated and sharing thesame disease of cystic fibrosis. This allows for identification ofidentical genetic attribute within their attribute profiles with respectto defect of the CFCR gene without regard for which particularnucleotide mutation is responsible for the defect. This type ofattribute expansion can be performed for any genetic sequence, not justgene encoding sequences, and need not be related to disease phenotypes.Further, the genetic attribute descriptors can be names or numericcodes, for example. In one embodiment, a single categorical geneticattribute descriptor can be used to represent a collection of nucleotidevariations occurring simultaneously across multiple locations of agenetic sequence or genome.

Similar to expansion of genetic attributes, attribute expansion can beperformed with epigenetic attributes. For example, multiple DNAmethylation modifications are known to occur simultaneously at differentnucleotide positions within DNA segments and can act in a cooperativemanner to effect regulation of expression of one gene, or even acollection of genes located at a chromosomal locus. Based on informationwhich indicates that several different patterns of epigenetic DNAmethylation, termed epigenetic polymorphisms, can produce the samephenotypic effect, a single categorical epigenetic attribute descriptorcan be derived as a descriptor for that group of epigenetic DNAmethylation patterns, thereby ensuring the opportunity for an epigeneticattribute match between individuals sharing predisposition to the sameoutcome but having a different epigenetic polymorphism that producesthat outcome. For example, it has been suggested by researchers thatseveral different patterns of epigenetic modification of the HTR2Aserotonin gene locus are capable of predisposing an individual toschizophrenia. For individuals associated with one of these particularschizophrenia-predisposing epigenetic patterns, the same categoricalepigenetic attribute of ‘HTR2A epigenetic schizophrenia pattern’ with astatus value of {1=yes} can be derived. For an individual who isnegative for all known schizophrenia-predisposing epigenetic patterns inthe HTR2A gene, the categorical epigenetic attribute of ‘HTR2Aepigenetic schizophrenia pattern’ with a status value {0=no} can bederived to indicate that the individual does not possess any of theepigenetic modifications of the HTR2A serotonin gene locus that areassociated with predisposition to schizophrenia.

In one embodiment, the original attribute value is retained and theexpanded attribute values provided in addition to allow the opportunityto detect similarities at both the maximal resolution level provided bythe original attribute value and the lower level of resolution and/orbroader coverage provided by the expanded attribute values or attributevalue range. In one embodiment, attribute values are determined fromdetailed questionnaires which are completed by the consumer/patientdirectly or with the assistance of clinician 820. Based on thesequestionnaires, attribute values such as those shown in FIGS. 9A and 9Bcan be derived. In one or more embodiments, when tabulating, storing,transmitting and reporting results of methods of the present invention,wherein the results include both narrow attributes and broad attributesthat encompass those narrow attributes, the broader attributes may beincluded and the narrow attributes eliminated, filtered or masked inorder to reduce the complexity and lengthiness of the final results.

Attribute expansion can be used in a variety of embodiments, many ofwhich are described in the present disclosure, in which statisticalassociations between attribute combinations and one or more queryattributes are determined, identified or used. As such, attributeexpansion can be performed to create expanded attribute profiles thatare more strongly associated with a query attribute than the attributeprofiles from which they were derived. As explained previously,attribute expansion can accomplish this by introducing predisposingattributes that were missing or introducing attributes of the correctresolution for maximizing attribute identities between attributeprofiles of a group of query-attribute-positive individuals. In effect,expansion of attribute profiles can reveal predisposing attributes thatwere previously masked from detection and increase the ability of amethod that uses the expanded attribute profiles to predict anindividual's risk of association with a query attribute with greateraccuracy and certainty as reflected by absolute risk results thatapproach either 1.0 (certainty of association) or 0.0 (certainty of noassociation) and have higher statistical significance. To avoidintroducing bias error into methods of the present invention, expansionof attribute profiles should be performed according to a set of rules,which can be predetermined, so that identical types of attributes areexpanded in the attribute profiles of all individuals processed by themethods. For example, if a method processes the attribute profiles of agroup of query-positive individuals and a group ofquery-attribute-negative individuals, and the query-attribute-positiveindividuals have had their primary age attributes expanded intosecondary categorical age attributes which have been added to theirattribute profiles, then attribute expansion of the primary ageattributes of the query-attribute-negative individuals should also beperformed according to the same rules used for thequery-attribute-positive individuals before processing any of theattribute profiles by the method. Ensuring uniform application ofattribute expansion across a collection of attribute profiles willminimize introducing considerable bias into those methods that useexpanded attribute profiles or data derived from them.

Consistent with the various embodiments of the present inventiondisclosed herein, computer based systems (which can comprise a pluralityof subsystems), datasets, databases and software can be implemented formethods of generating and using secondary attributes and expandedattribute profiles.

In one embodiment, a computer based method for compiling attributecombinations using expanded attribute combinations is provided. A queryattribute is received, and a set of expanded attribute profilesassociated with a group of query-attribute-positive individuals and aset of expanded attribute profiles associated with a group ofquery-attribute-negative individuals are accessed, both sets of expandedattribute profiles comprising a set of primary attributes and a set ofsecondary attributes, wherein the set of secondary attributes is derivedfrom the set of primary attributes and has lower resolution than the setof primary attributes. Attribute combinations having a higher frequencyof occurrence in the set of expanded attribute profiles associated withthe group of query-attribute-positive individuals than in the set ofexpanded attribute profiles associated with the group ofquery-attribute-negative individuals are identified. The identifiedattribute combinations are stored to create a compilation of attributecombinations that co-occur (i.e., co-associate, co-aggregate) with thequery attribute, thereby generating what can be termed an ‘attributecombination database’.

In one embodiment, a computer based method for expanding attributeprofiles to increase the strength of association between a queryattribute and a set of attribute profiles associated withquery-attribute-positive individuals is provided. A query attribute isreceived, and a set of attribute profiles associated with a group ofquery-attribute-positive individuals and a set of attribute profilesassociated with a group of query-attribute-negative individuals areaccessed. A first statistical result indicating strength of associationof the query attribute with an attribute combination having a higherfrequency of occurrence in the set of attribute profiles associated withthe group of query-attribute-positive individuals than in the set ofattribute profiles associated with the group of query-attribute-negativeindividuals is determined. One or more attributes in the set ofattribute profiles associated with the group of query-attribute-positiveindividuals and one or more attributes in the set of attribute profilesassociated with the query-attribute-negative individuals are expanded tocreate a set of expanded attribute profiles associated with the group ofquery-attribute-positive individuals and a set of expanded attributeprofiles associated with the group of query-attribute-negativeindividuals. A second statistical result indicating strength ofassociation of the query attribute with an attribute combination havinga higher frequency of occurrence in the set of expanded attributeprofiles associated with the group of query-attribute-positiveindividuals than in the set of expanded attribute profiles associatedwith the group of query-attribute-negative individuals is determined. Ifthe second statistical result is higher than the first statisticalresult, the expanded attribute profiles associated with the group ofquery-attribute-positive individuals and the expanded attribute profilesassociated with the group of query-attribute-negative individuals arestored.

In one embodiment, a computer based method for determining attributeassociations using an expanded attribute profile is provided. A queryattribute is received, and one or more primary attributes in anattribute profile associated with a query-attribute-positive individualare accessed. One or more secondary attributes are the derived from theprimary attributes such that the secondary attributes are lowerresolution attributes than the primary attributes. The secondaryattributes are stored in association with the attribute profile tocreate an expanded attribute profile. Attribute combinations that areassociated with the query attribute are determined by identifyingattribute combinations from the expanded attribute profile that havehigher frequencies of occurrence in a set of attribute profilesassociated with a group of query-attribute-positive individuals than ina set of attribute profiles associated with a group ofquery-attribute-negative individuals.

In one embodiment, a computer based method for determining attributeassociations using an expanded attribute profile is provided in whichone or more primary attributes in an attribute profile are accessed. Oneor more secondary attributes are generated from the primary attributessuch that the secondary attributes have lower resolution than theprimary attributes. The secondary attributes are stored in associationwith the attribute profile to create an expanded attribute profile. Thestrength of association between the expanded attribute profile and aquery attribute is determined by comparing the expanded attributeprofile to a set of attribute combinations that are statisticallyassociated with the query attribute.

The methods, systems, software and databases disclosed herein are ableto achieve determination of complex combinations of predisposingattributes not only as a consequence of the resolution and breadth ofdata used, but also as a consequence of the process methodology used fordiscovery of predisposing attributes. An attribute may have no effect onexpression of another attribute unless it occurs in the proper context,the proper context being co-occurrence with one or more additionalpredisposing attributes. In combination with one or more additionalattributes of the right type and degree, an attribute may be asignificant contributor to predisposition of the organism for developingthe attribute of interest. This contribution is likely to remainundetected if attributes are evaluated individually. As an example,complex diseases require a specific combination of multiple attributesto promote expression of the disease. The required disease-predisposingattribute combinations will occur in a significant percentage of thosethat have or develop the disease and will occur at a lower frequency ina group of unaffected individuals.

FIG. 10 illustrates an example of the difference in frequencies ofoccurrence of attributes when considered in combination as opposed toindividually. In the example illustrated, there are two groups ofindividuals referred to based on their status of association with aquery attribute (a specific attribute of interest that can be submittedin a query). One group does not possess (is not associated with) thequery attribute, the query-attribute-negative group, and the other doespossess (is associated with) the query attribute, thequery-attribute-positive group. In one embodiment, the query attributeof interest is a particular disease or trait. The two groups areanalyzed for the occurrence of two attributes, A and X, which arecandidates for causing predisposition to the disease. When frequenciesof occurrence are computed individually for A and for X, the observedfrequencies are identical (50%) for both groups. When the frequency ofoccurrence is computed for the combination of A with X for individualsof each group, the frequency of occurrence is dramatically higher in thepositive group compared to the negative group (50% versus 0%).Therefore, while both A and X are significant contributors topredisposition in this theoretical example, their association withexpression of the disease in individuals can only be detected bydetermining the frequency of co-occurrence of A with X in eachindividual.

FIG. 11 illustrates another example of the difference in frequencies ofoccurrence of attributes when considered in combination as opposed toindividually. In this example there are again two groups of individualsthat are positive or negative for an attribute of interest submitted ina query, which could again be a particular disease or trait of interest.Three genes are under consideration as candidates for causingpredisposition to the query attribute. Each of the three genes has threepossible alleles (each labeled A, B, or C for each gene). This examplenot only illustrates the requirement for attributes occurring incombination to cause predisposition, but also the phenomenon that therecan be multiple different combinations of attributes that produce thesame outcome. In the example, a combination of either all A, all B, orall C alleles for the genes can result in predisposition to the queryattribute. The query-attribute-positive group is evenly divided amongthese three attribute combinations, each having a frequency ofoccurrence of 33%. The same three combinations occur with 0% frequencyin the query-attribute-negative group. However, if the attributes areevaluated individually, the frequency of occurrence of each allele ofeach gene is an identical 33% in both groups, which would appear toindicate no contribution to predisposition by any of the alleles in onegroups versus the other. As can be seen from FIG. 11, this is not thecase, since every gene allele considered in this example does contributeto predisposition toward the query attribute when occurring in aparticular combination of alleles, specifically a combination of all A,all B, or all C. This demonstrates that a method of attributepredisposition determination needs to be able to detect attributes thatexpress their predisposing effect only when occurring in particularcombinations. It also demonstrates that the method should be able todetect multiple different combinations of attributes that may all causepredisposition to the same query attribute.

Although the previous two figures present frequencies of occurrence aspercentages, for the methods of the present invention the frequencies ofoccurrence of attribute combinations are can be stored as ratios forboth the query-attribute-positive individuals and thequery-attribute-negative individuals. Referring to FIG. 12A and FIG.12B, the frequency of occurrence for the query-attribute-positive groupis the ratio of the number of individuals of that group having theattribute combination (the exposed query-attribute-positive individualsdesignated ‘a’) to the total number of individuals in that group (‘a’plus ‘c’). The number of individuals in the query-attribute-positivegroup that do not possess the attribute combination (the unexposedquery-attribute-positive individuals designated ‘c’) can either betallied and stored during comparison of attribute combinations, orcomputed afterward from the stored frequency as the total number ofindividuals in the group minus the number of exposed individuals in thatgroup (i.e., (a+c)−a=c). For the same attribute combination, thefrequency of occurrence for the query-attribute-negative group is theratio of the number of individuals of that group having the attributecombination (the exposed query-attribute-negative individuals designated‘b’) to the total number of individuals in that group (‘b’ plus ‘d’).The number of individuals in the query-attribute-negative group that donot possess the attribute combination (the unexposedquery-attribute-negative individuals designated ‘d’) can either betallied and stored during comparison of attribute combinations or can becomputed afterward from the stored frequency as the total number ofindividuals in the group minus the number of exposed individuals in thatgroup (i.e., (b+d)−b=d).

The frequencies of occurrence of an attribute or attribute combination,when compared for two or more groups of individuals with respect to aquery attribute, are statistical results (values) that can indicatestrength of association of the attribute combination with a queryattribute and can therefore be referred to as corresponding statisticalresults in one or more embodiments of the present invention. Frequenciesof occurrence can also be utilized by statistical computation engine 224to compute additional statistical results for strength of association(i.e., strength of association values) of the attribute combinationswith the query attribute, and these statistical results may also bereferred to as corresponding statistical results in one or moreembodiments. The statistical measures used to compute these statisticalresults may include, but are not limited to, prevalence, incidence,probability, absolute risk, relative risk, attributable risk, excessrisk, odds (a.k.a. likelihood), and odds ratio (a.k.a. likelihoodratio). Absolute risk (a.k.a. probability), relative risk, odds, andodds ratio are the preferred statistical computations for the presentinvention. Among these, absolute risk and relative risk are the morepreferable statistical computations because their values can still becalculated for an attribute combination in instances where the frequencyof occurrence of the attribute combination in thequery-attribute-negative group is zero. Odds and odds ratio areundefined in instances where the frequency of occurrence of theattribute combination in the query-attribute-negative group is zero,because in that situation their computation requires division by zerowhich is mathematically undefined. One embodiment of the presentinvention, when supplied with ample data, is expected to routinely yieldfrequencies of occurrence of zero in query-attribute-negative groupsbecause of its ability to discover large predisposing attributecombinations that are exclusively associated with the query attribute.

FIG. 12B illustrates formulas for the statistical measures that can beused to compute statistical results. In one embodiment absolute risk iscomputed as the probability that an individual has or will develop thequery attribute, given exposure to an attribute combination. In oneembodiment, relative risk is computed as the ratio of the probabilitythat an exposed individual has or will develop the query attribute tothe probability that an unexposed individual has or will develop thequery attribute. In one embodiment, odds is computed as the ratio of theprobability that an exposed individual has or will develop the queryattribute (absolute risk of the exposed query-attribute-positiveindividuals) to the probability that an exposed individual does not haveand will not develop the query attribute (absolute risk of the exposedquery-attribute-negative individuals). In one embodiment, the odds ratiois computed as the ratio of the odds that an exposed individual has orwill develop the query attribute to the odds that an unexposedindividual has or will develop the query attribute.

In one embodiment, results for absolute risk and relative risk can beinterpreted as follows with respect to an attribute combinationpredicting association with a query attribute: 1) if absolute risk=1.0,and relative risk is mathematically undefined, then the attributecombination is sufficient and necessary to cause association with thequery attribute, 2) if absolute risk=1.0, and relative risk is notmathematically undefined, then the attribute combination is sufficientbut not necessary to cause association with the query attribute, 3) ifabsolute risk<1.0, and relative risk is not mathematically undefined,then the attribute combination is neither sufficient nor necessary tocause association with the query attribute, and 4) if absolute risk<1.0,and relative risk is mathematically undefined, then the attributecombination is not sufficient but is necessary to cause association withthe query attribute. In an alternate embodiment, a relative risk that ismathematically undefined can be interpreted to mean that there are twoor more attribute combinations, rather than just one attributecombination, that can cause association with the query attribute. In oneembodiment, an absolute risk<1.0 can be interpreted to mean one or moreof the following: 1) the association status of one or more attributes,as provided to the methods, is inaccurate or missing (null), 2) notenough attributes have been collected, provided to or processed by themethods, or 3) the resolution afforded by the attributes that have beenprovided is too narrow or too broad. These interpretations can be usedto increase accuracy and utility of the methods for use in manyapplications including but not limited to attribute combinationdiscovery, attribute prediction, predisposition prediction,predisposition modification and destiny modification.

The statistical results obtained from computing the statisticalmeasures, as well as the attribute combinations to which theycorrespond, can be subjected to inclusion, elimination, filtering, andevaluation based on meeting one or more statistical requirements whichmay be predetermined, predesignated, preselected or alternatively,computed de novo based on the statistical results. Statisticalrequirements can include, but are not limited to, numerical thresholds,statistical minimum or maximum values, and statistical significance(confidence) values which may collectively be referred to aspredetermined statistical thresholds. Ranks (e.g., numerical rankings)assigned to attribute combinations based on their attribute contentand/or the corresponding statistical results can likewise be subjectedto inclusion, elimination, filtering, and evaluation based on apredetermined threshold, in this case applied to rank, which can bespecified by a user or by the computer system implementing the methods.

One embodiment of the present invention can be used in many types ofstatistical analyses including but not limited to Bayesian analyses(e.g., Bayesian probabilities, Bayesian classifiers, Bayesianclassification tree analyses, Bayesian networks), linear regressionanalyses, non-linear regression analyses, multiple linear regressionanalyses, uniform analyses, Gaussian analyses, hierarchical analyses,recursive partitioning (e.g., classification and regression trees),resampling methods (e.g., bootstrapping, cross-validation, jackknife),Markov methods (e.g., Hidden Markov Models, Regular Markov Models,Markov Blanket algorithms), kernel methods (e.g., Support VectorMachine, Fisher's linear discriminant analysis, principle componentsanalysis, canonical correlation analysis, ridge regression, spectralclustering, matching pursuit, partial least squares), multivariate dataanalyses including cluster analyses, discriminant analyses and factoranalyses, parametric statistical methods (e.g., ANOVA), non-parametricinferential statistical methods (i.e., binomial test, Anderson-Darlingtest, chi-square test, Cochran's Q, Cohen's kappa, Efron-Petrosian Test,Fisher's exact test, Friedman two-way analysis of variance by ranks,Kendall's tau, Kendall's W, Kolmogorov-Smirnov test, Kruskal-Wallisone-way analysis of variance by ranks, Kuiper's test, Mann-Whitney U orWilcoxon rank sum test, McNemar's test, median test, Pitman'spermutation test, Siegel-Tukey test, Spearman's rank correlationcoefficient, Student-Newman-Keuls test, Wald-Wolfowitz runs test,Wilcoxon signed-rank test).

In one embodiment, the methods, databases, software and systems of thepresent invention can be used to produce data for use in and/or resultsfor the above statistical analyses. In another embodiment, the methods,databases, software and systems of the present invention can be used toindependently verify the results produced by the above statisticalanalyses.

In one embodiment a method is provided which accesses a first datasetcontaining attributes associated with a set of query-attribute-positiveindividuals and query-attribute-negative individuals, the attributesbeing pangenetic, physical, behavioral and situational attributesassociated with individuals, and creates a second dataset of attributesassociated with a query-attribute-positive individual but not associatedwith one or more query-attribute-negative individuals. A third datasetcan be created which contains combinations of attributes from the seconddataset (i.e., attribute combinations) that are either associated withone or more query-attribute-positive individuals or are not present inany of the query-attribute-negative individuals, along with thefrequency of occurrence in the query-attribute-positive individuals andthe frequency of occurrence in the query-attribute-negative individuals.Statistical computations based on the frequencies of occurrence can beperformed for each attribute combination, where the statisticalcomputation results indicate the strength of association, as measured byone or more well known statistical measures, between each attributecombination and the query attribute. The process can be repeated for anumber of query attributes, and multiple query-positive individuals canbe studied to create a computer-stored and machine-accessiblecompilation of different attribute combinations that co-occur with thequeried attributes. The compilation can be ranked (i.e., attributecombinations can be assigned individual ranks) and co-occurringattribute combinations not meeting a statistical requirement forstrength of association with the query attribute and/or at least aminimum rank can be eliminated from the compilation. The statisticalrequirement can be a minimum or maximum statistical value and/or a valueof statistical significance applied to one or more statistical results.In a further embodiment, ranking the attribute combinations can also bebased on the attribute content of the attribute combinations, such aswhether certain attributes are present or absent in a particularattribute combination, what percentage of attributes in a particularattribute combination are modifiable, what specific modifiableattributes are present in a particular attribute combination, and/orwhat types or categories of attributes (i.e., epigenetic, genetic,physical, behavioral, situational) are present in a particular attributeand in what relative percentages. These methods of ranking attributecombinations can be applied in various embodiments of the presentinvention disclosed herein.

Similarly, a system can be developed which contains a subsystem foraccessing a query attribute, a second subsystem for accessing a set ofdatabases containing pangenetic, physical, behavioral, and situationalattributes associated with a plurality of query-attribute-positive, andquery-attribute-negative individuals, a data processing subsystem foridentifying combinations of pangenetic, physical, behavioral, andsituational attributes associated with query-attribute-positiveindividuals, but not with query-attribute-negative individuals, and acalculating subsystem for determining a set of statistical results thatindicates a strength of association between the combinations ofpangenetic, physical, behavioral, and situational attributes with thequery attribute. The system can also include a communications subsystemfor retrieving at least some of pangenetic, physical, behavioral, andsituational attributes from at least one external database; a rankingsubsystem for ranking the co-occurring attributes according to thestrength of the association of each co-occurring attribute with thequery attribute; and a storage subsystem for storing the set ofstatistical results indicating the strength of association between thecombinations of pangenetic, physical, behavioral, and situationalattributes and the query attribute. The various subsystems can bediscrete components, configurations of electronic circuits within othercircuits, software modules running on computing platforms includingclasses of objects and object code, or individual commands or lines ofcode working in conjunction with one or more Central Processing Units(CPUs). A variety of storage units can be used including but not limitedto electronic, magnetic, electromagnetic, optical, opto-magnetic andelectro-optical storage.

In one application the method and/or system is used in conjunction witha plurality of databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichserve to store the aforementioned attributes. In one embodiment thepangenetic (genetic and epigenetic) data is stored separately from theother attribute data and is accessed by the system/method. In anotherembodiment the pangenetic data is stored with the other attribute data.A user, such as a clinician, physician or patient, can input a queryattribute, and that query attribute can form the basis for determinationof the attribute combinations associated with that query attribute. Inone embodiment the associations will have been previously stored and areretrieved and displayed to the user, with the highest ranked (moststrongly associated) combinations appearing first. In an alternateembodiment the calculation is made at the time the query is entered, anda threshold can be used to determine the number of attributecombinations that are to be displayed.

FIG. 13 illustrates a flowchart of one embodiment of a method forcreation of a database of attribute combinations, wherein 1st dataset1322, 2nd dataset 1324, 3rd dataset 1326 and 4th dataset 1328 correspondto 1st dataset 200, 2nd dataset 204, 3rd dataset 206 and 4th dataset 208respectively of the system illustrated in FIG. 2. Expanded 1st dataset202 of FIG. 2 is optional for this embodiment of the method and istherefore not illustrated in the flowchart of FIG. 13. One aspect ofthis method is the comparison of attributes and attribute combinationsof different individuals in order to identify those attributes andattribute combinations that are shared in common between thoseindividuals. Any attribute that is present in the dataset record of anindividual is said to be associated with that individual.

1st dataset 1322 in the flow chart of FIG. 13 represents the initialdataset containing the individuals' attribute dataset records to beprocessed by the method.

FIG. 14 illustrates an example of the content of a 1st datasetrepresenting attribute data for 111 individuals. Each individual'sassociation with attributes A-Z is indicated by either an associationstatus value of 0 (no, does do not possess the attribute) or a statusvalue of 1 (yes, does possess the attribute). In one embodiment, this ispreferred format for indicating the presence or absence of associationof an attribute with an individual. In an alternate embodiment, anindividual's attribute profile or dataset record contains the completeset of attributes under consideration and a 0 or 1 status value foreach. In other embodiments, representation of association of anattribute with an individual can be more complex than the simple binaryvalue representations of yes or no, or numerical 1 or 0. In oneembodiment, the presence of attributes themselves, for example theactual identity of nucleotides, a brand name, or a trait represented bya verbal descriptor, can be used to represent the identity, degree andpresence of association of the attribute. In one embodiment, the absenceof an attribute is itself an attribute that can be referred to and/orrepresented as a ‘not-attribute’. In one embodiment, a not-attributesimply refers to an attribute having a status value of 0, and in afurther embodiment, the not-attribute is determined to be associatedwith an individual or present in an attribute profile (i.e., dataset,database or record) if the corresponding attribute has a status value of0 associated with the individual or is present in the attribute profileas an attribute with a status value of 0, respectively. In anotherembodiment, a not-attribute can be an attribute descriptor having a‘not’ prefix, minus sign, or alternative designation impartingessentially the same meaning. In a further embodiment, not-attributesare treated and processed no differently than other attributes. Incircumstances where data for an attribute or an attribute's associationstatus cannot be obtained for an individual, the attribute or attributestatus may be omitted and represented as a null. Typically, a nullshould not be treated as being equivalent to a value of zero, since anull is not a value. A null represents the absence of a value, such aswhen no attribute or attribute association status is entered into adataset for a particular attribute.

In the example illustrated in FIG. 14, individuals #1-10 and #111possess unique attribute content which is not repeated in otherindividuals of this population. Individuals #11-20 are representative ofindividuals #21-100, so that the data for each of the individuals #11-20is treated as occurring ten times in this population of 111 individuals.In other words, there are nine other individuals within the group ofindividuals #21-100 (not shown in the table) that have A-Z attributevalues identical to those of individual #11. The same is true forindividuals #12, #13, #14, #15, #16, #17, #18, #19 and #20.

As shown in the flowchart of FIG. 13, in one embodiment the methodbegins with access query attribute step 1300 in which query attribute1320, provided either by a user or by automated submission, is accessed.For this example the query attribute is ‘A’. In access data forattribute-positive and attribute-negative groups of individuals step1302, the attribute data for attribute-positive (i.e.,query-attribute-positive) and attribute-negative (i.e.,query-attribute-negative) individuals as stored in 1st dataset 1322 areaccessed. The differentiation of the two groups of individuals is basedupon query attribute 1320 which determines the classification of theindividuals as either query-attribute-positive individuals (thoseindividuals that possess the query attribute in their 1st datasetrecord) or query-attribute-negative individuals (those individuals thatdo not possess the query attribute in their dataset record). For queryattribute ‘A’, individuals #1-10 are the query-attribute-positiveindividuals, and individuals #11-111 are the query-attribute-negativeindividuals.

In select attribute-positive individual_(N) step 1304, individual #1 isselected in this example for comparison of their attributes with thoseof other individuals. In store attributes of individual_(N) that are notpresent in a portion of the attribute-negative individuals step 1306,those attributes of the selected individual #1 that are not associatedwith a portion (e.g., one or more; a fraction having a specified value;a percentage such as 0.1%, 1%, 5%, 10%, 15%, 20%, or 25%, or more; or acontinuous non-integer value resulting from, for example, a statisticalcomputation) of the query-attribute-negative group of individuals, or arandomly selected subgroup of the query-attribute-negative group ofindividuals, are stored in 2nd dataset 1324 as potential candidateattributes for contributing to predisposition toward the queryattribute. In one embodiment this initial comparison step is used toincrease efficiency of the method by eliminating those attributes thatare associated with all of the query-attribute-negative individuals.Because such attributes occur with a frequency of 100% in thequery-attribute-negative group, they cannot occur at a higher frequencyin the query-attribute-positive group and are therefore not candidatesfor contributing to predisposition toward the query attribute.Therefore, this step ensures that only attributes of the individual thatoccur with a frequency of less than 100% in the query-attribute-negativegroup are stored in the 2nd dataset. This step is especially useful forhandling genetic attributes since the majority of the approximatelythree billion nucleotide attributes of the human genome are identicallyshared among individuals and may be eliminated from further comparisonbefore advancing to subsequent steps.

As mentioned above, this initial comparison to effectively eliminateattributes that are not potential candidates may be performed against arandomly selected subgroup of query-attribute-negative individuals.Using a small subgroup of individuals for the comparison increasesefficiency and prevents the need to perform a comparison against theentire query-attribute-negative population which may consist ofthousands or even millions of individuals. In one embodiment, such asubgroup preferably consists of at least 20, but as few as 10, randomlyselected query-attribute-negative individuals.

For the present example, only those attributes having a status value of1 for individual #1 and a status value of 0 for one or morequery-attribute-negative individuals are stored as potential candidateattributes, but in one embodiment those attributes having a status valueof 0 for individual #1 and a status value of 1 for one or morequery-attribute-negative individuals (i.e., attributes I, K, Q and W)can also be stored as candidate attributes, and may be referred to ascandidate not-attributes of individual #1. FIG. 15A illustrates the 2nddataset which results from processing the attributes of individual #1for query attribute ‘A’ in a comparison against individuals #11-111 ofthe query-attribute-negative subgroup. The stored candidate attributesconsist of C,E,F,N,T and Y. FIG. 15B illustrates a tabulation of allpossible combinations of these attributes. In store combinations ofattributes and their frequencies of occurrence for both groups ofindividuals step 1308, those combinations of attributes of 2nd dataset1324 that are found by comparison to be associated with one or morequery-attribute-positive individuals of 1st dataset 1322 are stored in3rd dataset 1326 along with the corresponding frequencies of occurrencefor both groups determined during the comparison. Although not relevantto this example, there may be instances in which a particular attributecombination is rare enough, or the group sizes small enough, that theselected query-attribute-positive individual is the only individual thatpossesses that particular attribute combination. Under suchcircumstances, no other individual of the query-attribute-positive groupand no individual of the query-attribute-negative group will be found topossess that particular attribute combination. To ensure that theattribute combination is stored as a potential predisposing attributecombination, one embodiment of the method can include a requirement thatany attribute combination not present in any of thequery-attribute-negative individuals be stored in the 3rd dataset alongwith the frequencies of occurrence for both groups. Any attributecombination stored according to this rule necessarily has a frequency ofoccurrence equal to zero for the query-attribute-negative group and afrequency of occurrence having a numerator equal to one for theattribute-positive group.

FIG. 16 illustrates a 3rd dataset containing a representative portion ofthe stored attribute combinations and their frequencies of occurrencefor the data of this example. Each frequency of occurrence is preferablystored as a ratio of the number of individuals of a group that areassociated with the attribute combination in the numerator and the totalnumber of individuals of that group in the denominator.

In store statistical results indicating the strength of associationbetween each attribute combination and the query attribute step 1310,the frequencies of occurrence previously stored in 3rd dataset 1326 areused to compute statistical results for the attribute combinations whichindicate the strength of association of each attribute combination withthe query attribute. As mentioned previously, the statisticalcomputations used may include prevalence, incidence, absolute risk(a.k.a. probability), attributable risk, excess risk, relative risk,odds and odds ratio. In one embodiment, absolute risk, relative risk,odds and odds ratio are the statistical computations performed (seeformulas in FIG. 12B). Computed statistical results stored with theircorresponding attribute combinations are shown in the 3rd datasetillustrated by FIG. 16. The odds and odds ratio computations for theattribute combinations CEFNTY, CEFNT, CEFNY, CFNTY and CEFN are shown asundefined in this 3rd dataset example because the frequencies ofoccurrence of these attribute combinations in thequery-attribute-positive group are zero.

For the sake of brevity, only the individual #1 was selected andprocessed in the method, thereby determining only the predisposingattribute combinations of individual #1 and those individuals of thegroup that also happen to possess one or more of those attributecombinations. However, one can proceed to exhaustively determine allpredisposing attribute combinations in the query-attribute-positivegroup and build a complete 3rd dataset for the population with respectto query attribute ‘A’. As shown in the flow chart of FIG. 13, this isachieved by simply including all attribute-positive individualsprocessed? step 1312 to provide a choice of selecting successiveindividuals from the query-attribute-positive group and processing theirattribute data through successive iteration of steps 1300-1310 oneindividual at a time until all have been processed. The selection ofsuccessive individuals may include the selection of every individual inthe query-attribute-positive group, or alternatively, may be restrictedto a randomly or non-randomly selected representative subset ofindividuals from the query-attribute-positive group of individuals. Theresulting data for each additional individual is simply appended intothe 3rd dataset during each successive iteration. When selecting andprocessing multiple individuals, data in the 2nd dataset is preferablydeleted between iterations, or uniquely identified for each individual.This will ensure that any data in the 2nd dataset originating from aprevious iteration is not reconsidered in current and subsequentiterations of other individuals in the group. Alternate techniques toprevent reconsideration of the data can be utilized.

In store significantly associated attribute combinations step 1314, 4thdataset 1328 may be created by selecting and storing only thoseattribute combinations and their associated data from the 3rd datasethaving a minimum statistical association with the query attribute. Theminimum statistical association can be a positive, negative or neutralassociation, or combination thereof, as determined by the user or thesystem. This determination can be made based on the statistical resultspreviously stored in 3rd dataset 1326. As an example, the determinationcan be made based on the results computed for relative risk.Statistically, a relative risk of >1.0 indicates a positive associationbetween the attribute combination and the query attribute, while arelative risk of 1.0 indicates no association, and a relative risk of<1.0 indicates a negative association.

FIG. 17 illustrates a 4th dataset consisting of attribute combinationswith a relative risk>1.0, from which the attribute combinations CETY andCE are excluded because they have associated relative risks < or =1.0.FIG. 18 illustrates another example of a 4th dataset that can becreated. In this example, a minimum statistical association requirementof either relative risk>4.0 or absolute risk>0.3 produce this 4thdataset.

It can be left up to the user or made dependent on the particularapplication as to which statistical measure and what degree ofstatistical association is used as the criteria for determininginclusion of attribute combinations in the 4th dataset. In this way, 4thdataset 1328 can be presented in the form of a report which containsonly those attribute combinations determined to be predisposing towardthe query attribute above a selected threshold of significantassociation for the individual or population of individuals.

In many applications it will be desirable to determine predisposingattribute combinations for additional query attributes within the samepopulation of individuals. In one embodiment this is accomplished byrepeating the entire method for each additional query attribute andeither creating new 2nd, 3rd and 4th datasets, or appending the resultsinto the existing datasets with associated identifiers that clearlyindicate what data results correspond to which query attributes. In thisway, a comprehensive database containing datasets of predisposingattribute combinations for many different query attributes may becreated.

In one embodiment of a method for creating an attribute combinationsdatabase, attribute profile records of individuals that have nulls forone or more attribute values are not processed by the method or areeliminated from the 1st dataset before initiating the method. In anotherembodiment, attribute profile records of individuals that have nulls forone or more attribute values are only processed by the method if thoseattribute values that are nulls are deemed inconsequential for theparticular query or application. In another embodiment, a population ofindividuals having one or more individual attribute profile recordscontaining nulls for one or more attribute values are only processed forthose attributes that have values (non-nulls) for every individual ofthat population.

In one embodiment of a method for creating an attribute combinationsdatabase, frequencies of occurrence and statistical results for strengthof association of existing attribute combinations in the attributecombinations dataset are updated based on the attribute profile of anindividual processed by the method. In another embodiment, frequenciesof occurrence and statistical results for strength of association ofexisting attribute combinations in the attribute combinations datasetare not updated based on the attribute profile of an individualprocessed by the method. In another embodiment, the processing of anindividual by the method can require first comparing the individuals'attribute profile to the preexisting attribute combinations dataset todetermine which attribute combinations in the dataset are also presentin the individual's attribute profile, and then in a further embodiment,based on the individual's attribute profile, updating the frequencies ofoccurrence and statistical results for strength of association of thoseattribute combinations in the dataset that are also present in theindividual's attribute profile, without further processing theindividual or their attributes by the method.

The 3rd and 4th datasets created by performing the above methods forcreation of a database of attribute combinations can be used foradditional methods of the invention that enable: 1) identification ofpredisposing attribute combinations toward a key attribute of interest,2) predisposition prediction for an individual toward a key attribute ofinterest, and 3) destiny modification provided as predispositionpredictions resulting from the addition or elimination of specificattribute associations.

A method for compiling an attribute combination database that requiresdetermining all possible combinations of attributes that can be formedfrom an attribute profile, and then computing the strength ofassociation of each of those attribute combinations with the queryattribute, can present a considerable computational challenge. Forexample, forming all possible subcombinations of 50 attributes from anattribute profile comprising just 100 attributes requires a minimum of1×10²⁹ operations (i.e., 100 choose 50=1×10²⁹), which would be expectedto take 3.2×10⁶ years of computing time on a 1 petaFLOPS supercomputer.One method for streamlining the identification of attribute combinationsthat co-associate with a query attribute is to compare attributeprofiles with one another and only evaluate those attribute combinationswhich constitute the intersection in attribute content (i.e., sharedattribute combinations) between the attribute profiles. This approacheliminates the computational expense of forming attribute combinationsthat are unique to only a single attribute profile.

One approach to determining co-associating attributes requiresdetermining the intersection of attributes for every possiblecombination of attribute profiles that can be formed from a set ofattribute profiles. Briefly, this method requires forming all possible2-tuple combinations of attribute profiles from the set of attributeprofiles and comparing the attribute profiles within each 2-tuple. Thelargest combination of attributes that occurs within both attributeprofiles of each 2-tuple is identified and stored as the largestattribute combination co-occurring in that 2-tuple. Next, all possible3-tuple combinations of the attribute profiles are formed. For each3-tuple, the largest attribute combination that occurs within all threeattribute profiles of that 3-tuple combination is identified and storedas the largest attribute combination co-occurring in that 3-tuple. Next4-tuples are formed and the largest co-occurring attribute combinationwithin each 4-tuple identified. This approach is repeated forprogressively larger tuples by simply increasing the n-tuple size by oneat each step. Computational burden can be reduced in part byincorporating a requirement that prevents the formation of any(n+1)-tuple combination from an n-tuple combination for which noco-occurring attribute combination was identified. With thisrequirement, identification of attribute combinations is completed atthe point at which every n-tuple combination generated at a particularstep is null for a co-occurring attribute combination (i.e., not asingle one of the newly generated n-tuple combinations containsattribute profiles having at least one shared attribute combination incommon).

The shortcomings of the immediately previous method are two-fold. Thefirst shortcoming relates to the very large number of attributecomparisons that may be required in the initial step alone. For example,when comparing 1,000 genetic attribute profiles comprising 1 millionSNPs per attribute profile, 5×10¹¹ individual attribute comparisons arerequired just for the initial step of comparing all possible pairs ofthe 1,000 genetic attribute profiles ((5×10⁵ possible pairings ofattribute profiles)×(10⁶ attributes per attribute profile)=5×10¹¹individual attribute comparisons). If each attribute profile containedthe full complement of 3 billion nucleotides of whole genomic sequence,then 1.5×10¹⁵ individual attribute comparisons would be required in thefirst step of comparing all possible pairs of attribute profiles,resulting in a computationally intensive method requiring asupercomputer. The second shortcoming of this particular method is thatit only identifies the largest attribute combination that is sharedwithin each n-tuple combination of attribute profiles. The method doesnot enable identification of smaller attribute combinations, containedwithin each largest identified attribute combination, which may beresponsible for the bulk of the strength of association of the largerattribute combinations with the query attribute. A smaller attributecombination would not be identified by this particular method unlessthere is at least one individual that possesses only that smallerattribute combination without having any of the other attributes presentin the larger attribute combination. To exemplify this deficiency,consider a query-attribute-positive group consisting of geneticallyidentical individuals (i.e., identical siblings or clones) all havingblue eyes, for which the submitted query attribute is blue eyes.Applying the above method to process the genetic attribute profiles ofthese query-attribute-positive individuals would yield an attributecombination potentially containing their entire genomic sequence, sincethat is the largest attribute combination shared in common between thesegenetically identical individuals. Such a large combination ofattributes yields little or no useful information about which particularattributes directly predispose an individual to having blue eye color.Although this is an extreme example, it clearly demonstrates adeficiency of this approach. The above shortcomings limit the usefulnessof this approach for determining attribute combinations associated witha query attribute and make it a nonpreferred method.

It is desirable that a method for compiling co-associating attributesidentify not only the largest attribute combinations shared by attributeprofiles, but also smaller attribute combinations as well, to determinethe smallest and most strongly associated core attribute combinationsthat co-associate with a particular query attribute. A core attributecombination can, for example, be defined as the smallest subset ofattributes having a statistically significant association with the queryattribute. An alternative definition of a core attribute combination canbe the smallest subset of attributes that confers an absolute risk ofassociation with the query attribute above a predetermined threshold.Other definitions of a core attribute combination can be formulated, forexample, based on needs arising from user implementation, population andsample sizes, statistical constraints, or available computing power.Identification of this core attribute combination and its attributecontent is of great importance because a core attribute combinationshould contain attributes that directly predispose the individual towardassociation with the query attribute. Subsets of attributes from thiscore attribute combination may therefore provide the most efficient anddirect means of acquiring or eliminating an association with the queryattribute, which is central to effectively modifying an individual'spredisposition toward that query attribute.

In one embodiment of a computationally efficient method for compilingco-associating attributes, attribute combinations associated with aquery attribute, including core attribute combinations, are identifiedwithout the need for supercomputing, even when evaluating populationscomprising millions of individuals and attribute profiles eachcomprising billions of attributes. To help accomplish this, arepresentative subset of query-attribute-positive attribute profiles canbe selected from a larger set of query-attribute-positive attributeprofiles. The representative subset of attribute profiles can be used toidentify candidate attributes and attribute combinations associated withthe query attribute much more efficiently than using the entire set ofquery-attribute-profiles, while still providing the potential toidentify relevant co-associating attributes. While not absolutelyrequired, selecting a representative subset of attribute profiles may beadvantageous when the set of query-attribute-positive attribute profilesincludes thousands or millions of attribute profiles. The selection of asubset of query-attribute-positive attribute profiles can be a randomselection or another appropriate and/or statistically valid method ofselection. The size of this subset can vary, but for example, cancomprise as few as 10 or as many as 100 or more attribute profiles.There may be several very different core attribute combinationsassociated with a given query attribute, potentially representingdifferent pathways to achieve association with that query attribute. Ina case where three or fewer core attribute combinations are expected tobe associated with a given query attribute, as few as 10 randomlyselected query-attribute-positive attribute profiles may enable theidentification of those attribute combinations. If it is expected thatmore than three core attribute combinations are associated with thequery attribute, then selecting a higher number ofquery-attribute-positive attribute profiles for the subset may beadvisable.

In one embodiment of a computationally efficient method for compilingco-associating attributes, a very beneficial step to the successful andefficient identification of co-associating attributes involveseliminating consideration of attributes in query-attribute-positiveattribute profiles that also occur in a large portion of thequery-attribute-negative attribute profiles. As previously describedherein, this can be accomplished by comparing one or morequery-attribute-positive attribute profiles with an appropriatelyselected (e.g., randomly selected) subset of query-attribute-negativeindividuals to eliminate those attributes possessed byquery-attribute-positive individuals that occur at a high frequency inthe query-attribute-negative group (for example at 80% or greaterfrequency) and are therefore likely to either have no association withthe query attribute, or a negative association. Failure to eliminatesuch commonly occurring attributes may add complexity to an attributecombination without increasing the strength of association of its coreattribute combination with the query attribute. It is thereforeadvantageous to eliminate such attributes initially, in order to arriveat determination of the core attribute combinations as quickly,efficiently and accurately as possible. While not absolutely required,this approach greatly increases efficiency when comparing numerousattribute profiles each containing large numbers of attributes, as forexample when processing whole genomic attribute profiles of a largepopulation where each attribute profile contains at least 3 billionnucleotide attributes which on average will be 99.9% identical betweenany given pair of individuals. The comparison of aquery-attribute-positive attribute profile with a subset ofquery-attribute-negative attribute profiles can identify a subset ofattributes from the query-attribute-positive attribute profile that donot occur in a portion of the query-attribute-negative attributeprofiles. This identified subset of attributes can be referred to as aset of candidate attributes. A set of candidate attributes can befurther processed to identify combinations of the candidate attributesthat co-associate with the query attribute.

In a further embodiment of a computationally efficient method forcompiling co-associating attributes, a divide-and-conquer approach canbe used to greatly increase the efficiency of identifying attributecombinations that are associated with a query attribute. This approachpartitions (subdivides, divides up, or segments) a set of attributeprofiles into subsets of attribute profiles, each subset comprisingthose attribute profiles that share the most attributes in common. Eachiteration of the divide-and-conquer approach partitions thequery-attribute-positive set (or subset) of attribute profiles into atleast two subsets, and multiple iterations can be used to generateadditional subsets. The attribute profiles that comprise each subset areevaluated to identify the largest attribute combination that they sharein common. Initially a first query-attribute-positive attribute profileis selected from the query-attribute-positive set of attribute profiles.As an example using a set of 10 attribute profiles, a first attributeprofile is selected from the set of 10 attribute profiles. This firstattribute profile, attribute profile #1, can then be used in a series ofpairwise comparisons with each of the other query-attribute-positiveattribute profiles in the set. In a preferred embodiment, all possiblepairwise comparisons of the first attribute profile with the otherattribute profiles are performed. In this example, the possible pairingsare {#1,#2}, {#1,#3}, {#1,#4}, {#1,#5}, {#1,#6}, {#1,#7}, {#1,#8},{#1,#9}, and {#1,#10}, for a total of nine pairwise attribute profilecomparisons. If each of the 10 individuals has an associated attributeprofile consisting of 10⁶ attributes, then this example would requireperforming 9×10⁶ individual attribute comparisons (9 paired attributeprofiles×10⁶ attributes per attribute profile). Sets of attributes(i.e., attribute combinations) constituting the intersection in contentbetween the two attribute profiles of each pairwise comparison can bestored to generate a first set of attribute combinations, wherein eachattribute combination can be stored in association with the pair ofattribute profiles from which it was generated. The largest attributecombination occurring in the first set of attribute combinations can beidentified and referred to as the primary attribute combination. As anexample, if the largest intersection of attributes occurs in the pairedcomparison {#1,#4}, then this intersection produces the primaryattribute combination for the subset of attribute profiles #1-#10 underconsideration. This primary attribute combination can serve as the basisfor partitioning the query-attribute-positive set of attribute profilesinto subsets of attribute profiles, one of which can include attributeprofiles that are most similar to #1 and #4. This is achieved by usingthe primary attribute combination in a series of pairwise comparisonswith each of the other attribute combinations previously stored in thefirst set of attribute combinations. Sets of attributes constituting theintersection in content between the two attribute combinations of eachpairwise comparison are stored to generate a second set of attributecombinations, wherein each attribute combination is stored inassociation with the three corresponding attribute profiles from it wasgenerated. Continuing from the example above, by using the primaryattribute combination corresponding to {#1,#4} in pairwise comparisonswith each of the other attribute combinations in the first setcorresponding to {#1,#2}, {#1,#3}, {#1,#5}, {#1,#6}, {#1,#7}, {#1,#8},{#1,#9}, and {#1,#10}, the resulting eight intersections of attributescorresponding to the triplets of attribute profiles {#1,#2,#4},{#1,#3,#4}, {#1,#4,#5}, {#1,#4,#6}, {#1,#4,#7}, {#1,#4,#8}, {#1,#4,#9},and {#1,#4,#10} can be stored as a second set of attribute combinations.The query-attribute-positive subset of attribute profiles can then bedivided into at least two subsets based on the sizes of the attributecombinations in the second set as compared with the size of the primaryattribute combination. More specifically, the attribute profiles whichcorrespond to attribute combinations in the second set that are equal toor larger than a predetermined fraction of the size of the primaryattribute combination, for example those that are at least 50% of thesize of the primary attribute combination, can be assigned to a firstsubset of attribute profiles, while the attribute profiles correspondingto the remaining attribute combinations which are less than thepredetermined fraction of the size of the primary attribute combination,for example those that are less than 50% of the size of the primaryattribute combination, can be assigned to a second subset of attributeprofiles. By doing this, the attribute profiles that are most similar tothe two attribute profiles which generated the primary attributecombination in the current iteration are clustered together into thefirst subset. The choice of 50% as the predetermined fraction of thesize of the primary attribute combination is arbitrary in theseexamples, and can be adjusted higher or lower to respectively increaseor decrease the degree of similarity desired of attribute profiles thatare partitioned into a subset. As such, the predetermined fraction ofthe size of the primary attribute combination essentially acts as astringency parameter for including and excluding attribute profiles fromthe subsets, and it can have substantial influence on the number ofattributes profiles partitioned into each subset, as well as the numberof subsets that will ultimately be formed.

Continuing with the above example in which the primary attributecombination derived from comparison of attribute profiles #1 and #4, thefirst subset will include attribute profiles #1 and #4 as well as anyother attribute profiles that correspond with attribute combinations inthe second set that are at least 50% of the size of that primaryattribute combination. For this example, assume that attribute profiletriplets {#1,#4,#6} and {#1,#4,#9} are associated with attributecombinations in the second set that are equal to or greater than 50% ofthe size of the primary attribute combination. Attribute profiles #6 and#9 would therefore be included in the first subset of attribute profilesalong with attribute profiles #1 and #4. Attribute profiles #2, #3, #5,#7, #8, and #10 on the other hand are assigned to the second subsetbecause they are associated with attribute combinations in the secondset that are less than 50% of the size of the primary attributecombination. The largest attribute combination shared by the attributeprofiles of the first subset can then be stored as a candidate attributecombination in a set of candidate attribute combinations.

The attribute profiles in the second subset can then be processedthrough a reiteration of the method, where the second subset can beredesignated as the subset of attribute profiles, a new first attributeprofile can be selected from this subset of attribute profiles, a newfirst set of attribute combinations can be generated from pairwisecomparison of the first attribute profile with the other attributeprofiles of this subset, a new primary attribute combination can bedetermined, a new second set of attribute combinations can be generatedfrom the pairwise comparison of the primary attribute combination withthe other attribute combinations in the first set of attributecombinations, and the current subset of attribute profiles can bedivided into a new first subset and a new second subset based on thecomparison of each of the attribute combinations in the second set withthe primary attribute combination. The largest attribute combinationoccurring in all the attribute profiles of the new first subset can bestored as a candidate attribute combination in the set of candidateattribute combinations. Reiteration can continue in this manner,beginning with the current second subset redesignated as the subset ofattribute profiles, until an iteration is reached where a new secondsubset containing one or more attribute profiles cannot be formed (i.e.,the new second subset formed is an empty/null set).

To exemplify this reiteration process continuing with the attributeprofiles from the above example, the second subset comprising attributeprofiles #2, #3, #5, #7, #8, and #10 is redesignated as the subset ofattribute profiles, and attribute profile #2 can be selected as a firstattribute profile for this subset. The selected attribute profile #2 isthen used to determine the attribute intersections of the five pairwiseattribute profile comparisons corresponding to {#2,#3}, {#2,#5},{#2,#7}, {#2,#8}, and {#2,#10}. Assuming attribute profiles #5 and #10are found to cluster with attribute profile #2 as a result of evaluatingthe intersection in attribute content of the pairwise comparisons asdescribed above, partition of this subset of attribute profiles createsa new first subset containing attribute profiles #2, #5 and #10, and anew second subset containing attribute profiles #3, #7, and #8. Thelargest attribute combination corresponding to the intersection ofattribute profiles #2, #5 and #10 is stored as a candidate attributecombination in the set of candidate attribute combinations. Reiterativeprocessing of the second subset comprising attribute profiles #3, #7 and#8 proceeds with attribute profile #3 selected as the first attributeprofile, which is then used to perform the two pairwise comparisons{#3,#7} and {#3,#8}. Assuming a comparison finds these three attributeprofiles to cluster together, no new second subset can be generated. Thelargest attribute combination corresponding to the intersection ofattribute profiles #3, #7 and #8 is stored as a candidate attributecombination in the set of candidate attribute combinations. Frequenciesof occurrence of each of the candidate attribute combinations that weregenerated and stored in the set of candidate attribute combinations canbe determined in the query-attribute-positive set of attribute profilesand in the query-attribute-negative set of attribute profiles so thatstrength of association of the candidate attribute combinations with thequery attribute can be determined and used as desired for other methods.

By clustering the attribute profiles into subsets, thedivide-and-conquer approach substantially increases efficiency becauseno comparisons of attribute profiles are performed across subsets.Consequently, the number of attribute profile comparisons required bythe divide-and-conquer approach is much less than that required by justthe first step of the nonpreferred method described previously whichcompares all possible combinations of attribute profiles that can beformed from a set of attribute profiles. To demonstrate this, consideragain the above example which used the divide-and-conquer approach topartition a set of 10 query-attribute-positive attribute profiles intothree nearly equally sized subsets of attribute profiles to generatethree candidate attribute combinations. That example required a total of16 pairwise comparisons of attribute profiles over three iterations(9+5+2=16). In contrast, the nonpreferred method would require 45pairwise comparisons of attribute profiles in its first step (10 choose2=45). When processing a much larger set, for example a set of 1,000query-attribute-positive attribute profiles, the divide-and-conquerapproach would require 1,996 pairwise attribute profile comparisons in ascenario in which the 1,000 attribute profiles cluster into three nearlyequally sized subsets of attribute profiles (999+665+332=1,996), whilethe nonpreferred method would require 499,500 pairwise comparisons inits first step (1,000 choose 2=499,500). Therefore, as the number ofattribute profiles in the query-attribute-positive set increases, thecomputational burden of the divide-and-conquer approach increaseslinearly, while the computational burden of the nonpreferred methodincreases exponentially. This represents a tremendous advantage incomputational efficiency of the divide-and-conquer approach.

In one embodiment, a plurality of sets of attributes (e.g., attributeprofiles) are evaluated and clustered into subsets according to thedivide-and-conquer approach described herein, wherein the subsets formedcan be mapped to a first half and second half of the plurality of setsof attributes by clustering the two most similar attribute sets withother attribute sets that are highly similar to those two.Alternatively, other clustering methods which look for similarities andwhich provide a basis for aggregation of attributes can be used (e.g.,seeding). In one embodiment all attributes are given binary values(present or not present) and the clustering is performed based on thepresence of combinations of attributes within thequery-attribute-positive group. In an alternate embodiment someattributes are continuous or multi-valued (e.g. obesity) and describedon a continuous value or discrete multi-valued basis. A number ofclustering algorithms, including but not limited to K-means clustering,as well as determination of similarity measures including geometricdistance or angles can be used to determine one or more of the subsets.Additionally, seeding techniques can be used to generate subsets, forexample by requiring that one or more attribute profiles that nucleateformation of one or more subsets contain a minimal specified orpredetermined set of attributes (i.e., a core attribute combination). Inone embodiment, if a particular attribute or set of attributes is knownto be causally associated with a particular outcome (i.e., a queryattribute), that attribute or set of attributes can be used as the basisfor clustering attributes, attribute profiles, and/or individuals intosubsets (clusters).

Each candidate attribute combination generated by the divide-and-conquerapproach constitutes the largest combination of attributes occurringwithin all of the attribute profiles of a particular subset of attributeprofiles. As explained previously, the largest attribute combinationidentified may contain smaller combinations of attributes (i.e., coreattribute combinations) that also co-associate with query attribute. Afurther embodiment of a computationally efficient method for compilingco-associating attributes is able to identify core attributecombinations, contained within a larger candidate attribute combinationfor example, using a top-down approach. These smaller core attributecombinations, by virtue of the way in which they are identified, cancontain attributes which are the most essential attributes forcontributing to co-association with the query attribute. Candidateattribute combinations determined by the divide-and-conquer approach arepreferably used as the starting point for identifying core attributecombinations. The following top-down approach to identifying a coreattribute combination begins with generating subcombinations ofattributes selected from a candidate attribute combination, the numberof attributes in each subcombination being less than that of thecandidate attribute combination. In one embodiment, the number ofattributes in each attribute subcombination is one less than thecandidate attribute combination from which the attributes are selected.In a further embodiment, all possible attribute subcombinationscontaining one less attribute than the candidate attribute combinationare generated, so that for every attribute comprising the candidateattribute combination there will be exactly one attribute subcombinationgenerated which lacks that attribute. The frequencies of occurrence ofeach of the candidate attribute combinations and attributesubcombinations can be determined in the query-attribute-positive set ofattribute profiles and in the query-attribute-negative set of attributeprofiles, and based on the frequencies of occurrence, eachsubcombination having a lower strength of association with the queryattribute than the candidate attribute combination from which it wasgenerated is identified. A lower strength of association would beexpected to result from an increased frequency of occurrence, in thequery-attribute-negative set of attribute profiles, of the attributesubcombination relative to the candidate attribute combination fromwhich it was generated. Because each attribute subcombination is missingat least one attribute relative to the candidate attribute combinationfrom which it was generated, a missing attribute can be readilyidentified as a core attribute responsible for the lower strength ofassociation since it constitutes the only difference between theattribute subcombination and the candidate attribute combination. Byevaluating all of the attribute subcombinations that are generated froma particular candidate attribute combination with respect to strength ofassociation with the query attribute as above, a set of attributesconstituting a core attribute combination can be identified. Theidentified core attributes can be stored as candidate attributes, or asa combination of candidate attributes (i.e., a candidate attributecombination). Various combinations of the core attributes can beindependently evaluated for frequencies of occurrence and strength ofassociation with the query attribute to determine a set containing evensmaller attribute combinations comprised of subsets of core attributes,each of these even smaller core attribute combinations potentiallyhaving very different strengths of association with the query attribute.When compiled into attribute combination databases, these numerous smallcore attribute combinations can enable methods of predispositionprediction and predisposition modification to provide considerably moreaccurate, comprehensive, flexible and insightful results.

In another embodiment of a computationally efficient method forcompiling co-associating attributes, a bottom-up approach is used fordetermining attribute combinations that are associated with a queryattribute. This bottom-up approach generates sets of attributes instages, starting with small attribute combinations and progressivelybuilding on those to generate larger and larger attribute combinations.At each stage, only the attribute combinations that are determined to bestatistically associated with the query attribute are used as buildingblocks for the next stage of generating larger attribute combinations.The attributes used for generating these attribute combinations can beselected from an attribute profile, from an attribute combination, froma set of candidate attributes, or from a candidate attributecombination, for example. At each stage, all of the attributecombinations that are generated contain the same number of attributes,and can therefore be referred to as a set of n-tuple combinations ofattributes, where n is a specified positive integer value designatingthe number of attributes in each n-tuple combination of attributes. Thismethod can be used for de novo identification of attribute combinationsthat are statistically associated with a query attribute, as well as foridentifying one or more core attribute combinations from a previouslyidentified candidate attribute combination. The method can begin bygenerating n-tuples of any chosen size, size being limited only by thenumber of attributes present in the attribute profile, attributecombination, or set of attributes from which attributes are selected forgenerating the n-tuple combinations. However, it is preferable to beginwith small size n-tuple combinations if using this bottom-up approachfor the de novo identification of attribute combinations because thismethod typically requires generating all possible n-tuple combinationsfor the chosen starting value of n in the first step. If the n-tuplesize chosen is too large, an unmanageable computational problem can becreated. For example, if n=50 is chosen as the starting n-tuple sizewith a set of 100 attributes, all possible 50-tuple combinations fromthe 100 attributes would be 1×10²⁹ combinations, which is a currentlyunmanageable even with current supercomputing power. Therefore, it ismore reasonable to choose 2-tuple, 3-tuple, 4-tuple, or 5-tuple sizedcombinations to start with, depending on the size of the set ofattributes from which the n-tuple combinations will be generated and theamount of computing time and computer processor speed available. Once afirst set of n-tuple combinations of attributes is generated,frequencies of occurrence are determined for each n-tuple combination ina query-attribute-positive set of attribute profiles and in aquery-attribute-negative set of attribute profiles. Each n-tuplecombination that is statistically associated with the query attribute isidentified based on the frequencies of occurrence and stored in acompilation containing attribute combinations that are associated withthe query attribute. If no n-tuple combinations are determined to bestatistically associated with the query attribute, the value of n can beincremented by one and the method can be reiterated, beginning at thefirst step, for the larger n-tuple size. If, on the other hand, at leastone n-tuple was determined to be statistically associated with the queryattribute and stored in the compilation, a set of (n+1)-tuplecombinations are generated by combining each stored n-tuple combinationwith each attribute in the set of attributes that does not already occurin that n-tuple (combining an n-tuple with an attribute from the setthat already occurs in that n-tuple would create an (n+1)-tuplecontaining an attribute redundancy, which is undesirable). Next,frequencies of occurrence of the (n+1)-tuple combinations are determinedand those (n+1)-tuple combinations which have a higher strength ofassociation with the query attribute than the n-tuple combinations fromwhich they were generated are stored in the compilation containingattribute combinations that are associated with the query attribute.Storing an (n+1)-tuple combination that does not have a higher strengthof association with the query attribute than the n-tuple combinationfrom which it is generated effectively adds an attribute combination tothe compilation which contains an additional attribute that is notpositively associated with the query attribute, something that isundesirable. Provided at least one (n+1)-tuple combination has astronger statistical association with the query attribute and wasstored, the value of n is incremented by one and a next iteration of themethod is performed, so that the (n+1)-tuple combinations generatedduring the current iteration become the n-tuple combinations of the nextiteration. By generating progressively larger n-tuple combinations ateach iteration and storing those that have increasingly strongerstatistical association with the query attribute than the ones before, acompilation of attribute combinations that are associated with the queryattribute is generated which can be used effectively for methods ofattribute prediction, predisposition prediction and predispositionmodification.

In one embodiment a method of identifying predisposing attributecombinations is provided which accesses a first dataset containingattribute combinations and statistical computation results that indicatethe potential of each attribute combination to co-occur with a queryattribute, the attributes being pangenetic, physical, behavioral, andsituational attributes. A tabulation can be performed to provide, basedon the statistical computation results, those attribute combinationsthat are most likely to co-occur with the query attribute, or arank-ordering of attribute combinations of the first dataset thatco-occur with the query attribute. In a further embodiment, ranking ofthe attribute combinations can include consideration of the attributecontent of the attribute combinations, such as whether certainattributes are present or absent in a particular attribute combination,what percentage of attributes in a particular attribute combination aremodifiable, what specific modifiable attributes are present in aparticular attribute combination, and/or what types or categories ofattributes (i.e., epigenetic, genetic, physical, behavioral,situational) are present in a particular attribute and in what relativepercentages.

Similarly, a system can be developed which contains a subsystem foraccessing or receiving a query attribute, a second subsystem foraccessing a dataset containing attribute combinations comprisingpangenetic, physical, behavioral and situational attributes thatco-occur with one or more query attributes, a communications subsystemfor retrieving the attribute combinations from at least one externaldatabase, and a data processing subsystem for tabulating the attributecombinations. The various subsystems can be discrete components,configurations of electronic circuits within other circuits, softwaremodules running on computing platforms including classes of objects andobject code, or individual commands or lines of code working inconjunction with one or more Central Processing Units (CPUs). A varietyof storage units can be used including but not limited to electronic,magnetic, electromagnetic, optical, opto-magnetic and electro-opticalstorage.

In one application the method and/or system is used in conjunction withone or more databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichcan serve to store the aforementioned attribute combinations andcorresponding statistical results. In one embodiment the attributecombinations are stored in a separate dataset from the statisticalresults and the correspondence is achieved using identifiers or keyspresent in (shared across) both datasets. In another embodiment theattribute combinations and corresponding statistical results data arestored with other attribute data. A user, such as a clinician, physicianor patient, can input a query attribute, and that query attribute canform the basis for tabulating attribute combinations associated withthat query attribute. In one embodiment the associations have beenpreviously stored and are retrieved and displayed to the user, with thehighest ranked (most strongly associated) combinations appearing first.In an alternate embodiment the tabulation is performed at the time thequery attribute is entered and a threshold used to determine the numberof attribute combinations to be displayed.

FIG. 19 illustrates a flow chart for a method of attributeidentification providing tabulation of attribute combinations thatco-associate with an attribute of interest provided in a query. Inreceive query attribute step 1900, query attribute 1920 can be providedas one or more attributes in a query by a user. Alternatively, queryattribute 1920 can be provided by automated submission, as part of a setof one or more stored attributes for example. In access attributecombinations and statistical results indicating strength of associationwith the query attribute step 1902, 1st dataset 1922 containingattribute combinations that co-occur with the query attribute andstatistical results that indicate the corresponding strength ofassociation of each of the attribute combinations with the queryattribute is accessed. For this example the query attribute is ‘A’, anda representative 1st dataset is shown in FIG. 16. In transmit attributecombinations that co-associate with the query attribute step 1904,attribute combinations that co-occur with the query attribute aretransmitted as output, preferably to at least one destination such as auser, a database, a dataset, a computer readable memory, a computerreadable medium, a computer processor, a computer network, a printoutdevice, a visual display, a digital electronic receiver and a wirelessreceiver. In a further embodiment, the output may be transmitted as atabulation having the attribute combinations ordered according to a rankassigned to each attribute combination based on their strength ofassociation with the query attribute and/or attribute content, and thecorresponding statistical results which indicate the strength ofassociation of the attribute combinations with the query attribute canalso be included in the tabulation. Further, attribute combinations canbe included or excluded based on a predetermined statistical thresholdand/or attribute content. For example, attribute combinations below aminimum strength of association (i.e., a predetermined statisticalthreshold) and/or those containing certain user specified attributes maybe excluded. In one embodiment, a minimum strength of association can bespecified by the user in reference to one or more statistical measures.In an alternative embodiment, a predetermined statistical threshold canbe computed de novo by the computer system based on statistical resultsassociated with the dataset. This can provide flexible thresholds thatcan be tailored to the range of data values in a particular dataset ortailored to a particular application, thereby potentially yielding moreuseful results.

As an example, a minimum strength of association requiring relativerisk > or =1.0 may be chosen. Based on this chosen requirement, thetabulated list of attribute combinations shown in FIG. 20 would resultfrom processing the 1st dataset represented in FIG. 16. The attributecombinations are ordered according to rank. In this example, rank valueswere automatically assigned to each attribute combination based on thenumber of attributes in each attribute combination and the magnitude ofthe corresponding absolute risk value. The higher the absolute riskvalue, the lower the numerical rank assigned. For attribute combinationshaving the same absolute risk, those with more total attributes percombination receive a lower numerical rank. This treatment is based ontwo tendencies of larger predisposing attribute combinations. The firstis the general tendency of predisposing attribute combinationscontaining more attributes to possess a higher statistical strength ofassociation with the query attribute. The second is the general tendencyfor elimination of a single attribute from larger combinations ofpredisposing attributes to have less of an effect on strength ofassociation with the query attribute. The resulting tabulated list ofFIG. 20 therefore provides an rank-ordered listing of predisposingattribute combinations toward attribute ‘A’, where the first attributecombination in the listing is ranked as the most predisposing attributecombination identified and the last attribute combination in the listingis ranked as the least predisposing attribute combination of allpredisposing attribute combinations identified for the population ofthis example.

In one embodiment a method for predicting predisposition of anindividual for query attributes of interest is provided which accesses afirst dataset containing attributes associated with an individual and asecond dataset containing attribute combinations and statisticalcomputation results that indicate strength of association of eachattribute combination with a query attribute, the attributes beingpangenetic, physical, behavioral and situational attributes. Acomparison can be performed to determine the largest attributecombination of the second dataset that is also present in the firstdataset and that meets a minimum statistical requirement, the resultbeing stored in a third dataset. The process can be repeated for aplurality of query attributes to generate a predisposition profile ofthe individual, which can be in the form of a data file, a record or areport, containing the individual's predisposition toward (potential forassociation with) each of the plurality of query attributes. In oneembodiment, a tabulation can be performed to provide a predispositionprediction profile, record or report indicating the predisposition ofthe individual for each of the query attributes. In one embodiment,predisposition can be defined as a statistical result indicatingstrength of association between an attribute or attribute combinationand a query attribute.

Similarly, a system can be developed which contains a subsystem foraccessing or receiving a query attribute, a second subsystem foraccessing a dataset containing attributes of an individual, a thirdsubsystem for accessing attribute combinations of pangenetic, physical,behavioral, and situational attributes that co-occur with one or morequery attributes, a communications subsystem for retrieving theattribute combinations from at least one external database, and a dataprocessing subsystem for comparing and tabulating the attributecombinations. The various subsystems can be discrete components,configurations of electronic circuits within other circuits, softwaremodules running on computing platforms including classes of objects andobject code, or individual commands or lines of code working inconjunction with one or more Central Processing Units (CPUs). A varietyof storage units can be used including but not limited to electronic,magnetic, electromagnetic, optical, opto-magnetic and electro-opticalstorage.

In one application the method and/or system is used in conjunction withone or more databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichcan serve to store the aforementioned attribute combinations andcorresponding statistical results. In one embodiment the attributecombinations are stored in a separate dataset from the statisticalresults and the correspondence is achieved using identifiers, links orkeys present in (shared across) both datasets. In another embodiment theattribute combinations and corresponding statistical results data isstored with the other attribute data. A user, such as a clinician,physician or patient, can input a query attribute, and that queryattribute can form the basis for tabulating attribute combinationsassociated with that query attribute. In one embodiment the associationswill have been previously stored and are retrieved and displayed to theuser, with the highest ranked (most strongly associated) combinationsappearing first. In an alternate embodiment the tabulation is performedat the time the query attribute is entered, and a threshold can be usedto determine the number of attribute combinations that are to bedisplayed.

FIG. 21 illustrates a flowchart for a method of predictingpredisposition of an individual toward an attribute of interest withwhich they currently have no association or their association iscurrently unknown. In receive query attribute step 2100, query attribute2120 can be provided as one or more attributes in a query by a user.Alternatively, query attribute 2120 can be provided by automatedsubmission, as part of a set of one or more stored attributes that maybe referred to as key attributes. These key attributes can be submittedas a list, or they may be designated attributes within a dataset thatalso contains predisposing attribute combinations with correspondingstatistical results indicating their strength of association with one ormore of the key attributes.

For this example, query attribute ‘A’ is submitted by a user in a query.In access attributes of an individual step 2102 the attributes of anindividual whose attribute profile is contained in a 1st dataset 2122are accessed. A representative 1st dataset for individual #112 is shownin FIG. 22A. In access stored attribute combinations and statisticalresults step 2104, attribute combinations and corresponding statisticalresults for strength of association with query attribute 2120 containedin 2nd dataset 2124 are accessed. A representative 2nd dataset for thisexample is shown in FIG. 22B. In store the largest attribute combinationof the 2nd dataset that is also present in the 1st dataset and meets aminimum statistical requirement step 2106, attribute combinations of 2nddataset 2124 that are also present in 1st dataset 2122 are identified bycomparison, and the largest identified attribute combination shared byboth datasets and its corresponding statistical results for strength ofassociation with the query attribute are stored in 3rd dataset 2126 if aminimum statistical requirement for strength of association is met.Absolute risk and relative risk are the preferred statistical results,although other statistical computations such as odds and odds ratio canalso be used. A representative 3rd dataset is shown in FIG. 23A.Individual #112 possesses the largest predisposing attribute combinationCEFNTY, for which the corresponding statistical results for strength ofassociation with attribute ‘A’ are an absolute risk of 1.0 and arelative risk of 15.3. In process another query attribute? step 2108, adecision is made whether to perform another iteration of steps 2100-2106for another attribute of interest. Continuing with this example,attribute ‘W’ is received and another iteration is performed. For thisexample, after completing this iteration there are no additionalattributes of interest submitted, so upon reaching process another queryattribute? step 2108, a decision is made not to perform any furtheriterations. The method concludes with tabulate predisposing attributecombinations and the corresponding statistical results step 2110,wherein all or a portion of the data of 3rd dataset 2126 is tabulated toprovide statistical predictions for predisposition of the individualtoward each of the query attributes of interest. In one embodiment, thetabulation can include ordering the tabulated data based on themagnitude of the statistical results, or the importance of the queryattributes.

In one embodiment, the tabulation can be provided in a form suitable forvisual output, such as a visual graphic display or printed report.Attribute combinations do not need to be reported in predispositionprediction and can be omitted or masked so as to provide only the queryattributes of interest and the individual's predisposition predictionfor each. In creating a tabulated report for viewing by a consumer,counselor, agent, physician, patient or consumer, tabulating thestatistical predictions can include substituting the terminology‘absolute risk’ and ‘relative risk’ with the terminology ‘absolutepotential’ and ‘relative potential’, since the term ‘risk’ carriesnegative connotations typically associated with the potential fordeveloping undesirable conditions like diseases. This substitution maybe desirable when the present invention is used to predictpredisposition for desirable attributes such as specific talents orsuccess in careers and sports. Also, the numerical result of absoluterisk is a mathematical probability that can be converted to chance bysimply multiplying it by 100%. It may be desirable to make thisconversion during tabulation since chance is more universally understoodthan mathematical probability. Similarly, relative risk can berepresented as a multiplier, which may facilitate its interpretation.The resulting tabulated results for this example are shown in FIG. 23B,in which all of the aforementioned options for substitution ofterminology and conversion of statistical results have been exercised.The tabulated results of FIG. 23B indicate that individual #112 has a100% chance of having or developing attribute ‘A’ and is 15.3 times aslikely to have or develop attribute ‘A’ as someone in that populationnot associated with attribute combination CEFNTY. The results furtherindicate that individual #112 has a 36% chance of having or developingattribute ‘W’ and is 0.7 times as likely to have or develop attribute‘W’ as someone in that population not associated with attributecombination CE.

In one embodiment a method for individual destiny modification isprovided which accesses a first dataset containing attributes associatedwith an individual and a second dataset containing attributecombinations and statistical computation results that indicate strengthof association of each attribute combination with a query attribute, theattributes being pangenetic, physical, behavioral and situationalattributes. A comparison can be performed to identify the largestattribute combination of the second dataset that consists of attributesof the first dataset. Then, attribute combinations of the second datasetthat either contain that identified attribute combination or consist ofattributes from that identified attribute combination can be stored in athird dataset. The content of the third dataset can be transmitted as atabulation of attribute combinations and corresponding statisticalresults which indicate strengths of association of each attributecombination with the query attribute, thereby providing predispositionpotentials for the individual toward the query attribute givenpossession of those attribute combinations. In one embodiment destinycan be defined as statistical predisposition toward having or acquiringone or more specific attributes.

Similarly, a system can be developed which contains a subsystem foraccessing or receiving a query attribute, a second subsystem foraccessing a dataset containing attributes of an individual, a thirdsubsystem for accessing attribute combinations comprising pangenetic,physical, behavioral, and/or situational attributes that co-occur withone or more query attributes, a communications subsystem for retrievingthe attribute combinations from at least one external database, and adata processing subsystem for comparing and tabulating the attributecombinations. The various subsystems can be discrete components,configurations of electronic circuits within other circuits, softwaremodules running on computing platforms including classes of objects andobject code, or individual commands or lines of code working inconjunction with one or more Central Processing Units (CPUs). A varietyof storage units can be used including but not limited to electronic,magnetic, electromagnetic, optical, opto-magnetic, and electro-opticalstorage.

In one application the method and/or system is used in conjunction withone or more databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichcan serve to store the aforementioned attribute combinations andcorresponding statistical results. In one embodiment the attributecombinations are stored in a separate dataset from the statisticalresults and the correspondence is achieved using identifiers, links orkeys present in (shared across) both datasets. In another embodiment theattribute combinations and corresponding statistical results data isstored with the other attribute data. A user, such as a clinician,physician or patient, can input a query attribute, and that queryattribute can form the basis for tabulating attribute combinationsassociated with that query attribute. In one embodiment the associationswill have been previously stored and are retrieved and displayed to theuser, with the highest ranked (most strongly associated) combinationsappearing first. In an alternate embodiment the tabulation is performedat the time the query attribute is entered, and a threshold can be usedto determine the number of attribute combinations that are to bedisplayed.

FIG. 24 illustrates a flow chart for a method of providing intelligentdestiny modification in which statistical results for changes to anindividual's predisposition toward a query attribute that result fromthe addition or elimination of specific attribute associations in theirattribute profile are determined. In receive query attribute step 2400,query attribute 2420 can be provided as one or more attributes in aquery by a user or by automated submission. In this example queryattribute ‘A’ is received. In access attributes of an individual step2402, the attribute profile of a selected individual contained in 1stdataset 2422 is accessed. For this example, a representative 1st datasetfor individual #113 is shown in FIG. 25A. In access stored attributecombinations and statistical results step 2404, attribute combinationsfrom 2nd dataset 2424 and corresponding statistical results for strengthof association with query attribute 2420 are accessed. FIG. 16illustrates a representative 2nd dataset. In identify the largestattribute combination in the 2nd dataset that consists of 1st datasetattributes step 2406, the largest attribute combination in 2nd dataset2424 that consists entirely of attributes present in 1st dataset 2422 isidentified by comparison. In this example, the largest attributecombination identified for individual #113 is CEF. In store attributecombinations of the 2nd dataset that either contain the identifiedattribute combination or consist of attributes from the identifiedattribute combination step 2408, those attribute combinations of 2nddataset 2424 that either contain the largest attribute combinationidentified in step 2406 or consist of attributes from that attributecombination are selected and stored in 3rd dataset 2426. For thisexample both types of attributes are stored, and the resultingrepresentative 3rd dataset for individual #113 is shown in FIG. 25B. Intransmit a tabulation of the attribute combinations and correspondingstatistical results step 2410, attribute combinations from 3rd dataset2426 and their corresponding statistical results are tabulated into anordered list of attribute combinations and transmitted as output,wherein the ordering of combinations can be based on the magnitudes ofthe corresponding statistical results such as absolute risk values.Further, the tabulation may include only a portion of the attributecombinations from 3rd dataset 2426 based on subselection. A subselectionof attribute combinations that are larger that the largest attributecombination identified in step 2406 may require the inclusion of onlythose that have at least a minimum statistical association with thequery attribute. For example, a requirement can be made that the largerattribute combinations have an absolute risk value greater than that ofthe attribute combination identified in step 2406. This will ensure theinclusion of only those larger attribute combinations that showincreased predisposition toward the query attribute relative to theattribute combination identified in step 2406. Similarly, a subselectionof attribute combinations that are smaller than the attributecombination identified in step 2406 may require the inclusion of onlythose that have less than a maximum statistical association with thequery attribute. For example, a requirement can be made that the smallerattribute combinations must have an absolute risk less than that of theattribute combination identified in step 2406. This will ensure theinclusion of only those smaller attribute combinations with decreasedpredisposition toward the query attribute relative to the attributecombination identified in step 2406.

In one embodiment the method for individual destiny modification is usedto identify and report attributes that the individual may modify toincrease or decrease their chances of having a particular attribute oroutcome. In one embodiment, the tabulation of attribute combinationsproduced by the method of destiny modification is filtered to eliminatethose attribute combinations that contain one or more attributes thatare not modifiable. In an alternate embodiment, modifiable attributesare prioritized for modification in order to enable efficient destiny(i.e., predisposition) modification. In one embodiment, non-historicalattributes (attributes that are not historical attributes) areconsidered modifiable while historical attributes are considered notmodifiable. In another embodiment, non-historical behavioral attributesare considered to be the most easily or readily modifiable attributes.In another embodiment, non-historical situational attributes areconsidered to be the most easily or readily modifiable attributes. Inanother embodiment, non-historical physical attributes are consideredthe most easily or readily modifiable attributes. In another embodiment,non-historical pangenetic attributes are considered the most easily orreadily modifiable attributes. In one embodiment, the modifiableattributes are ranked or otherwise presented in a manner indicatingwhich are most easily or readily modifiable, which may include creatingcategories or classes of modifiable attributes, or alternatively,reporting attributes organized according to the attribute categories ofthe invention.

FIG. 25C illustrates an example of tabulation of attribute combinationsfor individual #113 without statistical subselection of the larger andsmaller attribute combinations. The larger attribute combinations showhow predisposition is altered by adding additional attributes to thelargest attribute combination possessed by individual #113 (bolded), andthe smaller attribute combinations show how predisposition is altered byremoval of attributes from the largest attribute combination possessedby individual.

FIGS. 26A, 26B and 26C illustrate 1st dataset, 3rd dataset and tabulatedresults, respectively, for a different individual, individual #114,processed by the method for destiny modification using the same queryattribute ‘A’ and the 2nd dataset of FIG. 16. The largest attributecombination possessed by individual #114 is CET, which has an absoluterisk of 0.14 for predisposition toward query attribute ‘A’. In thiscase, the tabulation of attribute combinations in FIG. 26C is obtainedby imposing statistical subselection requirements. The subselectionrequired that only those larger attribute combinations having anabsolute risk greater than 0.14 be included and that only those smallerattribute combinations having an absolute risk less than 0.14 beincluded. These subselection requirements result in the exclusion oflarger attribute combination CETY and smaller attribute combination CTfrom the tabulation. In this example, the tabulation also exemplifieshow the nomenclature and statistical computations may be altered toincrease ease of interpretation. Absolute risk results have beenconverted to percentages, relative risk results have been converted tomultipliers, and the terms absolute potential and relative potentialhave been substituted for the terms absolute risk and relative riskrespectively. The tabulated listing of attribute combinations indicateswhat individual #114 can do to increase or decrease their predispositiontoward query attribute ‘A’.

In one embodiment, a method for predisposition modification utilizingpangenetic, physical, behavioral, and/or situational attributes isprovided in which a set of attributes for selective modification of theattribute profile of an individual are determined to enable theindividual to modify their predisposition for acquiring an attribute ofinterest. The attribute of interest can be provided in the form of aquery attribute received from a user or computer automated query.Additionally, a minimum strength of association value can also beprovided as input to serve as a threshold for ensuring that theresulting set of attributes for predisposition modification will provideat least a minimum degree of statistical certainty that the individualwill acquire the attribute of interest (i.e., a minimum potential forassociation with the query attribute) upon modifying their attributeprofile. A minimum strength of association value can be a value orresult of a statistical measure such as absolute risk or relative risk,that is used as a threshold for selecting attribute combinations havingcorresponding strength of association values at or above that thresholdvalue, such as was previously described with respect to compilingattribute combination databases. Following receipt of a query attributeand minimum strength of association value, an attribute profile of anindividual and a set containing attribute combinations and correspondingstrength of association values (i.e., statistical measure results/valuesthat can indicate the strength of association of an attributecombination with a query attribute, such as absolute risk and relativerisk) can be accessed. One or more of the attribute combinations havingcorresponding strength of association values equal to or greater thanthe minimum strength of association value can be identified. From theidentified attribute combinations, an attribute combination containingone or more attributes that do not occur in the attribute profile of theindividual can be identified. The one or more attributes that do notoccur in the attribute profile of the individual can be stored as a setof attributes for predisposition modification of the individual. Thecorresponding strength of association value of the selected attributecombination can be stored in association with the set of attributes forpredisposition modification as an indicator of the individual'spotential for association with the query attribute that would resultfrom modifying the attribute profile of the individual with the set ofattributes for predisposition modification.

Additionally, in one embodiment corresponding strength of associationvalues can be stored for each of the attributes in the set of attributesfor predisposition modification of the individual, to indicate thecontribution of each of the attributes toward modifying the individual'spotential for acquiring the query attribute. These correspondingstrength of association values can be derived from the set containingattribute combinations and corresponding strength of association values.For example, in one embodiment the corresponding strength of associationvalue of a first attribute combination can be subtracted from thestrength of association value of a second attribute combination thatdiffers from the first attribute combination only by possession of asingle additional attribute. That single additional attribute can beconsidered to be responsible for any difference between thecorresponding strength of association values of the two attributecombinations. Therefore, the strength of association value derived bythis subtraction procedure can be assigned as a corresponding strengthof association value to that single attribute which constitutes thedifference in content between the two attribute combinations (pair ofattribute combinations). If multiple pairs of attribute combinations inthe set containing attribute combinations happen to differ by the samesingle attribute, then a plurality of corresponding strength ofassociation values can be derived and then averaged to generate a thecorresponding strength of association value for that single attribute.

A corresponding strength of association value derived for a singleattribute as described above can be used to indicate that particularattribute's contribution (or potential/predicted contribution) towardpredisposition to the query attribute. The single attribute can be anattribute selected from the set of attributes for predispositionmodification. As such, a corresponding strength of association value canbe derived for each attribute contained in the set of attributes forpredisposition modification of the individual, and then stored inassociation with the particular attribute to which it refers(corresponds). Corresponding strength of association values can bestored within the set of attributes for predisposition modification, orthey can be stored in a different set or database and linked to theattributes to which they correspond.

In one embodiment, if a particular subset of attributes are selectedfrom the set of attributes for predisposition modification, acorresponding strength of association value can be derived for thatparticular subset of attributes by adding or mathematically compoundingthe corresponding strength of association values of the attributes thatcomprise that subset. As such, a composite strength of association valuecan be generated to indicate the contribution toward predisposition thatthe subset of attributes will provide if used collectively to modify theindividual's attribute profile. This composite strength of associationvalue can be added to the individual's original strength of associationwith the query attribute, which can be determined by a method ofpredisposition prediction disclosed previously herein using theindividual's original attribute profile. In this way a statisticalprediction can be generated which indicates the individual's statisticalpotential for acquiring the query attribute upon modifying theiroriginal attribute profile with only a subset of attributes selectedfrom the set of attributes for predisposition modification. In anotherembodiment, the corresponding strength of association value for a subsetof attributes can be determined by directly deriving the value from apair of attribute combinations, from the set of containing attributecombinations corresponding strength of association values, which differin content by the full complement of attributes constituting the subset.This can provide a more accurate statistical prediction for thecontribution of the subset to predisposition modification than thealternative of adding or compounding corresponding strength ofassociation values that were individually determined for each of theattributes that comprise the subset. In one embodiment, correspondingstrength of association values are generated using each approach, andthe two values averaged to generate a single corresponding strength ofassociation value for a subset of attributes for predispositionmodification.

Similarly, a computer based system for predisposition modificationutilizing pangenetic, physical, behavioral, and/or situationalattributes can be developed which contains a data receiving subsystemfor receiving a query attribute and a minimum strength of associationvalue; a first data accessing subsystem for accessing an attributeprofile of an individual; a second data accessing subsystem foraccessing a set (e.g., an attribute combination database) containingattribute combinations and corresponding strength of association valuesthat indicate the strength of association of each of the attributecombinations with the query attribute; a data processing subsystemcomprising a data comparison subsystem for identifying one or more ofthe attribute combinations having corresponding strength of associationvalues equal to or greater than the minimum strength of associationvalue, and for identifying non-historical attributes within the set ofattributes for predisposition modification of the individual aspotentially modifiable attributes; a data processing subsystemcomprising a data selection subsystem for selecting, from the identifiedattribute combinations, an attribute combination containing one or moreattributes that do not occur in the attribute profile of the individual;a data storage subsystem for storing the one or more attributes that donot occur in the attribute profile of the individual as a set ofattributes for predisposition modification of the individual; and a datastorage subsystem for storing one or more corresponding strength ofassociation values for the attributes in the set of attributes forpredisposition modification. The various subsystems can be discretecomponents, configurations of electronic circuits within other circuits,software modules running on computing platforms including classes ofobjects and object code, or individual commands or lines of code workingin conjunction with one or more Central Processing Units (CPUs). Avariety of storage units can be used including but not limited toelectronic, magnetic, electromagnetic, optical, opto-magnetic, andelectro-optical storage.

In one application the method and/or system is used in conjunction withone or more databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichcan serve to store the aforementioned query attributes, attributeprofiles, attribute combinations, corresponding strength of associationvalues, and sets of attributes for predisposition modification. In oneembodiment the attribute combinations are stored in a separate datasetfrom the corresponding strength of association values and thecorrespondence is achieved using identifiers, links or keys present in(shared across) both datasets. In another embodiment the attributecombinations and corresponding strength of association values data arestored with other attribute data. A user, such as a clinician, physicianor patient, can input a query attribute (and optionally, a minimumstrength of association value) which can form the basis for generatingthe set of attributes for predisposition modification. In one embodimentthe attributes for predisposition modification can be stored and thenretrieved and displayed to the user. They can be ranked, with thehighest ranked attributes (those having the greatest influence onpredisposition toward the query attribute or a plurality of queryattributes, or those that are most readily or easily modified) appearinghigher on a tabulation that can be presented to the user. In analternate embodiment the tabulation can be performed using apredetermined threshold to determine the number of attributes to bedisplayed, stored or transmitted.

FIG. 27 illustrates a flow chart for a method of predispositionmodification. In receive query attribute and minimum strength ofassociation value step 2700, a query attribute 2720 and a minimumstrength of association value 2722 are received from a user or automatedsubmission. In access attribute profile of an individual step 2702, anattribute profile 2724 of an individual is accessed in preparation forcomparison with a set of attribute combinations. In access setcontaining attribute combinations and statistical results step 2704, aset of attribute combinations 2726 which contains attribute combinationsand corresponding strength of association values that indicate thestrength of association of the attribute combinations with the queryattribute are accessed. In identify attribute combinations havingcorresponding strength of association values greater than or equal tothe minimum strength of association value step 2706, attributecombinations from the set of attribute combinations 2726 that havecorresponding strength of association values above the minimum strengthof association value received in step 2700 are identified for furtherprocessing. In select an attribute combination containing attributesthat do not occur in the attribute profile step 2708, a single attributecombination containing one or more attributes that do not occur withinattribute profile 2724 is selected from among the attribute combinationsidentified in previous step 2706. If more than one attribute combinationfrom among those identified in step 2706 satisfies the requirement ofcontaining one or more attributes that do not occur in the attributeprofile of the individual, a requirement of selecting the attributecombination containing the fewest number of attributes that do not occurin the attribute profile of the individual can be imposed within step2708. In store the attributes that do not occur in the attribute profilestep 2710, the one or more attributes from the selected attributecombination that do not occur within attribute profile 2724 are storedas a set of attributes for predisposition modification 2728.

In one embodiment the method for predisposition modification can berepeated for a succession of query attributes to generate a plurality ofsets of attributes for predisposition modification of the individual. Inanother embodiment, one or more sets of attributes for predispositionmodification of the individual can be transmitted (i.e., output) to auser, a computer readable memory, a computer readable medium, adatabase, a computer processor, a computer on a network, a visualdisplay, a printout device, a wireless receiver and/or a digitalelectronic receiver. In another embodiment, one or more sets ofattributes for predisposition modification of the individual can betransmitted to generate a predisposition modification report or record.In another embodiment, non-historical attributes can be identifiedwithin the set of attributes for predisposition modification of theindividual as potentially modifiable attributes. In a furtherembodiment, the identified non-historical attributes can be transmitted(i.e., output) to generate a report or record regarding potentiallymodifiable attributes for predisposition modification. In anotherembodiment, a requirement can be imposed that the one or more attributesthat do not occur in the attribute profile of the individual must benon-historical attributes. In another embodiment, historical attributescan be eliminated from the set of attributes for predispositionmodification of the individual. In another embodiment, genetic and/orepigenetic attributes can be eliminated from the set of attributes forpredisposition modification of the individual. In another embodiment,genetic and/or epigenetic attributes can be considered not modifiableand consequently classified and/or treated as historical attributes thatare not modifiable. In another embodiment, the set of attributes forpredisposition modification of the individual (i.e., the output) can belinked to (i.e., stored in association with) an identifier of theindividual, the attribute profile of the individual, and/or a record ofthe individual. In another embodiment the identity of the individual canbe masked or anonymized. In another embodiment, corresponding strengthof association values derived for the attributes in the set ofattributes for predisposition modification can be stored along withthose attributes in the set. In a further embodiment, the attributes inthe set of attributes for predisposition modification of the individualcan be rank-ordered based on the stored corresponding strength ofassociation values. In another embodiment, each attribute occurringwithin a plurality of sets of attributes for predisposition modificationtoward one or more query attributes can be tabulated as a rank-orderedlist of attributes that indicates which of the attributes have thegreatest influence on predisposition toward the one or more queryattributes, based on the number of sets that contain each attribute andthe corresponding strength of association value(s) for each attribute.In one embodiment, the magnitude of effect that each attribute has onpredisposition can be computed and used as a comparative measure torank-order the predisposing attributes in the sets. For example, themagnitude of effect of each attribute on a plurality of query attributescan be calculated by adding the corresponding statistical results for anattribute with respect to one or more of the plurality of queryattributes, to generate a composite statistical value for the effect ofthe attribute. This can be repeated for each of the predisposingattributes with respect to the particular query attributes theyinfluence, and the resulting composite statistical values for each ofthe predisposing attributes compared with one another to rank theattributes, in order to indicate those that have the largest or smallestinfluence on predisposition to the plurality of query attributes, forexample.

In a preferred mode of comparing genetic attributes, specificallynucleotide sequences, for embodiments of the present invention disclosedherein, the comparison can be a direct sequence comparison requiring twoor more sequences to be the same at the nucleotide sequence level,wherein each nucleotide can be represented by an individual attributecontaining both nucleotide sequence position and nucleotide identityinformation. Therefore, a nucleotide sequence can be presented as a setof genetic attributes containing individual genetic attributescomprising nucleotide sequence information, such a nucleotide positionand nucleotide identity information for nucleotides constituting acontiguous genetic sequence like a chromosome, or a gene located withina chromosome. To increase efficiency, at the cost of loosing anyimportant information contained in non-gene-coding regions of thegenome, a direct sequence comparison between genomic sequences can useonly gene coding and gene regulatory sequences since these represent theexpressed and expression-controlling portions of the genome,respectively. In embodiments where computing power and time areavailable, a comparison of the whole genome can be performed as opposedto using only the 2% of the genome which encodes genes and generegulatory sequences, since the noncoding region of the genome may stillhave effects on genome expression which influence attributepredisposition.

With respect to regions of the genome that contain genes encodingproteins, in one embodiment, comparison engine 222 is permitted somedegree of flexibility during comparison of nucleotide sequences, so thatidentical nucleotides at the same nucleotide positions within twonucleotide sequences encoding the same protein is not always required.For example, when a single nucleotide difference between two sequencesencoding the same protein is deemed unlikely to result in a functionaldifference between the two encoded proteins, it is beneficial to makethe determination that the two sequences are the same (i.e., equivalent,or identical) even though they are actually not identical. The reasonfor allowing non-identical matches being that since the nucleotidedifference is functionally silent with respect to the encoded proteinthat is ultimately expressed, it should not have a differential effecton attribute predisposition. A number of equivalence rules can beprovided to comparison engine 222 to guide such decision making. Theserules are based on the knowledge of several phenomena. For example,within an open reading frame of a nucleotide sequence encoding aprotein, a single nucleotide difference in the 3rd nucleotide positionof a codon—termed the wobble position—often does not change the identityof the amino acid encoded by the codon, and therefore does not changethe amino acid sequence of the encoded protein. Determination of whetheror not a particular nucleotide change in a wobble position alters theencoded protein amino acid sequence is easily made based on publishedinformation known to those in the art. Types of changes that areunlikely to affect protein function are those that are known to befunctionally silent (i.e., silent mutations, and silent amino acidsubstitutions), those that result in conservative amino acid changesparticularly in non-enzymatic, non-catalytic, nonantigenic ornon-transmembrane domains of the protein, and those that simply alterthe location of truncation of a protein within the same domain of oneprotein relative to another. Truncation of a protein can result from anonsense mutation introduced by nucleotide substitution (i.e., pointmutations), or alternatively, by nucleotide insertions or deletionswhich cause a frameshift within the open reading frame that introduces astop codon acting as a premature translation termination signal of theencoded protein.

Allowing flexibility in sequence matching can increase the number ofsequences determined to be identical, but may also reduce thesensitivity of the invention to detect predisposing attributes. Theremay be sequence changes which are thought to be innocuous orinconsequential based on current scientific knowledge that in actualityare not. For example, nucleotide changes in the wobble codon positionthat do not change the amino acid sequence may appear to beinconsequential, but may actually impact the stability of theintermediary RNA transcript required for translation of nucleotidesequence into the encoded protein, thus having a significant effect onultimate levels of expressed protein. Therefore, application of therules can be left to up the user's discretion or automatically appliedwhen processing small populations of individuals where the lowopportunity for exact matches resulting from small sample size increasesthe probability of obtaining an uninformative result.

In one embodiment, when a particular set of rules fails to providesufficient detection of predisposing attributes, the rules can bemodified in order to provide higher granularity or resolution for thediscovery of predisposing attributes. As such, nucleotide changes in thewobble codon position may be examined in certain applications. Byvarying the rules, the appropriate level of granularity or resolutioncan be determined. In one embodiment, the rules are varied on a testpopulation (which can be comprised of both attribute-positive andattribute-negative individuals) in an effort to determine the mostappropriate rules for the greater population.

Based on this knowledge, the following equivalence rules can be appliedby comparison engine 222 when comparing two nucleotide sequences:

-   -   a) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        nucleotides within the open reading frame that do not alter the        amino acid sequence of the encoded protein;    -   b) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        nucleotides within the open reading frame that result in        conservative amino acid substitutions within the amino acid        sequence of the encoded protein;    -   c) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        nucleotides within the open reading frame that result in        conservative amino acid substitutions anywhere within the amino        acid sequence of the encoded protein except for enzymatic,        transmembrane or antigen-recognition domains;    -   d) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        nucleotides within the open reading frame that result in silent        amino acid substitutions;    -   e) a direct sequence comparison may determine two nucleotide        sequences that do not encode amino acid sequences to be        equivalent if they differ only by the identity of nucleotide        mutations occurring at the same position within both sequences;    -   f) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        conservative missense mutations within the open reading frame of        the encoded protein;    -   g) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more        conservative missense mutations anywhere within the open reading        frame encoding the protein except for those regions of the open        reading frame that encode enzymatic, transmembrane or        antigen-recognition domains of the protein;    -   h) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by one or more silent        mutations within the open reading frame;    -   i) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by the locations of        nonsense mutations within the open reading frame that occur        within a same domain of the encoded protein;    -   j) a direct sequence comparison may determine two        protein-encoding nucleotide sequences to be equivalent if they        encode the same protein and differ only by the locations of        frameshift mutations within the open reading frame that occur        within the same respective domain of the encoded protein.

A method and system for genetic attribute analysis can be developed inwhich non-identical sets of genetic attributes comprising nucleotidesequence are compared to determine whether proteins encoded by thosenucleotide sequences are functionally equivalent, and therefore whethergenetic information contained in the sets of genetic attributes can beconsidered to be equivalent (i.e., a match, the same, and/or identical).A determination of equivalence between two or more non-identical yetessentially equivalent sets of genetic attributes can enable thecompression of thousands of individual DNA nucleotide attributes into asingle categorical genetic attribute assigned to represent those sets ofgenetic attributes, which is useful for methods such as attributediscovery, predisposition prediction and predisposition modificationwhere a reduction in the amount of genomic data can enhance processingefficiency of the methods. Sets of genetic attributes can be determinedto be equivalent based on whether they are able to satisfy one or moreequivalence rules (i.e., requirements for equivalence) applied to theircomparison. In one embodiment, the equivalence rules can be those listedas (a)-(j) above.

In one embodiment a computer based method for genetic attribute analysisis provided in which a first set of genetic attributes associated with afirst individual (or a first group of individuals) comprising a firstnucleotide sequence containing an open reading frame encoding a proteincan be accessed. A second set of genetic attributes associated with asecond individual (or a second group of individuals) comprising a secondnucleotide sequence containing the open reading frame encoding theprotein can also be accessed, wherein one or more nucleotides of thesecond nucleotide sequence differ from one or more nucleotides of thefirst nucleotide sequence. The first nucleotide sequence and the secondnucleotide sequence can be compared to identify whether they areequivalent based on one or more equivalence rules for comparison ofnon-identical protein-encoding nucleotide sequences. A determinationindicating that the first set of genetic attributes is identical to thesecond set of genetic attributes can be generated if the firstnucleotide sequence and the second nucleotide sequence were identifiedto be equivalent, and the generated determination can be stored.

In one embodiment, a determination that two sets of genetic attributesare identical can be a determination assigned to each of the attributesof one set of genetic attributes with respect to their counterpartattributes in the other set. For example, the determination can indicatethat the attribute containing the identity of the nucleotide in position1 of the open reading frame comprised by the first set of geneticattributes is identical to (i.e., a match with) the attribute containingthe identity of the nucleotide in position 1 of the open reading framecomprised by the second set of genetic attributes, and so forth forsuccessive attributes representing nucleotides in successive positionsof that open reading frame, for both sets of genetic attributes. In oneembodiment, the determination is an indicator, flag, marker, or recordof a match between two sets of attributes, or between individualattributes of the two sets of attributes. In another embodiment thedetermination can be one of two possible outcomes of a binary decisionregarding whether the two sets of attributes are a match (i.e.,equivalent, the same, or identical). In a further embodiment, the twopossible outcome choices for a determination of identity or a matchbetween attributes or attribute sets resulting from a comparisoninvolving a binary decision can be, for example, any of the following:yes or no, 1 or 0, match or no match, identical or non-identical,equivalent or not equivalent, and same or different. In one embodiment,the determination can be linked to an attribute combination, a set ofattributes, an attribute profile of an individual, a dataset, and/or arecord in a database. In one embodiment, the determination can betransmitted to a user, a computer readable memory, a computer readablemedium, a database, a dataset, a computer processor, a computer on anetwork, a visual display, a printout device, a wireless receiver and/ora digital electronic receiver.

FIG. 28 illustrates a flow chart for a method of genetic attributeanalysis. In access 1st set of genetic attributes comprising a 1stnucleotide sequence step 2800, a 1st set of genetic attributes 2802associated with an individual and comprising a first nucleotide sequencecontaining an open reading frame encoding a protein is accessed, forexample, in a genetic database. In access 2nd set of genetic attributescomprising a 2nd nucleotide sequence that differs from the 1stnucleotide sequence step 2804, a 2nd set of genetic attributes 2806associated with one or more individuals and comprising a secondnucleotide sequence containing the open reading frame encoding theprotein is accessed, wherein one or more nucleotides of the secondnucleotide sequence differ from one or more nucleotides of the firstnucleotide sequence. In identify whether the 1st and 2nd nucleotidesequences are equivalent step 2808, the first nucleotide sequence andthe second nucleotide sequence are compared to identify whether they areequivalent according to one or more equivalence rules for comparison ofnon-identical nucleotide sequences. If the first and second nucleotidesequences are not identified as being equivalent, then exit step 2810 isexecuted, at which point the method may be reinitiated using differentsets of genetic attributes, for example. If however the first and secondnucleotide sequences are identified as being equivalent, then generate adetermination indicating that the 1st and 2nd sets of genetic attributesare a match step 2812 is executed to generate a determination that the1st and second sets of attributes are identical (i.e., equivalent). Instore the determination step 2814, the generated determination of amatch is stored as equivalence determination 2816.

In one embodiment of a method for genetic attribute analysis, the firstset of genetic attributes can comprise an attribute combinationassociated with the first individual (or a first group of individuals)and the second set of genetic attributes can comprise an attributecombination associated with the second individual (or a second group ofindividuals). In a further embodiment, an attribute combination can be aselected subset of attributes from a set of genetic attributes, theselection of the subset being performed according to empirical evidenceindicating the importance of the subset, results from determinations ofsubsets based on studies made using full sets of genetic attributes(e.g., whole genome sequence), or other tests, calculations ordeterminations which provide for the creation of subsets of geneticattributes. In one embodiment, a frequency of occurrence of an attributecombination can be computed for a group of individuals (e.g., aquery-attribute-positive or query-attribute-negative groups ofindividuals) using a determination indicating that two sets of geneticattributes are identical, just as with the identification of identicalnon-genetic attributes occurring in different sets of attributes (i.e.,attribute profiles, or attribute combinations) as described throughoutthe present disclosure. In one embodiment, a statistical result (e.g.,an absolute risk, or a relative risk) indicating the strength ofassociation of an attribute combination with a query attribute can becomputed using the determination. In one embodiment, one or morestatistical predictions indicating the potential association between theindividual and the query attribute can be generated based, at least inpart, on the determination.

In one embodiment, the first set of genetic attributes can comprise aportion of an attribute profile associated with a first individual (or afirst group of individuals) and the second set of genetic attributes cancomprise a portion of an attribute profile associated with a secondindividual (or a second group of individuals). In a further embodiment,the attribute profile contains only genetic attributes. In analternative embodiment, the attribute profile can also containepigenetic, physical, behavioral, and situational attributes, or anycombination thereof. In one embodiment a categorical attribute can begenerated using the determination, in order to expand an attributeprofile of an individual. In a further embodiment, the categoricalattribute can be generated as a categorical genetic attribute that canbe added, linked and/or associated with the attribute profile in orderto expand the attribute profile (or to create a new expanded attributeprofile containing only categorical genetic attributes associated withthe individual or group of individuals).

Similarly, a system can be developed which comprises a first dataaccessing subsystem for accessing a first set of genetic attributesassociated with a first individual (or a first group of individuals)comprising a first nucleotide sequence containing an open reading frameencoding a protein; a second data accessing subsystem for accessing asecond set of genetic attributes associated with a second individual (ora second group of individuals) comprising a second nucleotide sequencecontaining the open reading frame encoding the protein, wherein one ormore nucleotides of the second nucleotide sequence differ from one ormore nucleotides of the first nucleotide sequence; a data processingsubsystem comprising (i) a data comparison subsystem for identifyingwhether the first nucleotide sequence and the second nucleotide sequenceare equivalent based on an equivalence rule for comparison ofnon-identical nucleotide sequences, and (ii) a data generating subsystemfor generating a determination indicating that the first set of geneticattributes associated with the first individual is identical to thesecond set of genetic attributes associated with the second individual,if the first nucleotide sequence and the second nucleotide sequence wereidentified to be equivalent; a data storage subsystem for storing thedetermination; and a communications subsystem for transmitting thedetermination to a user, a computer readable memory, a computer readablemedium, a database, a dataset, a computer processor, a computer on anetwork, a visual display, a printout device, a wireless receiver and/ora digital electronic receiver. The various subsystems can be discretecomponents, configurations of electronic circuits within other circuits,software modules running on computing platforms including classes ofobjects and object code, or individual commands or lines of code workingin conjunction with one or more Central Processing Units (CPUs). Variousstorage units can be used including but not limited to electronic,magnetic, electromagnetic, optical, opto-magnetic and electro-opticalstorage.

In one application the method and/or system is used in conjunction withone or more databases, such as those that would be maintained byhealth-insurance providers, employers, or health-care providers, whichcan serve to store the aforementioned sets of genetic attributes,attribute profiles, attribute combinations, frequencies of occurrence,corresponding statistical results, datasets, database records,categorical attributes, equivalence rules and determinations. In oneembodiment the equivalence determinations are stored separately (i.e.,in a separate dataset) from attributes or sets of genetic attributes andthe correspondence is achieved using identifiers, links or keys presentin or shared between them (i.e., across datasets). In one embodiment,sets of genetic attributes can be either stored together with otherattribute data (e.g., an attribute profile) for an individual or groupsof individuals, or can be stored separately from other attribute data asin a dedicated genetic attribute database, human genome database orgenetic data repository. In one embodiment, attribute combinations andcorresponding statistical results data can be stored together with otherattribute data. In another embodiment, attribute combinations andcorresponding statistical results can be stored separately from eachother and/or separately from other attribute data. A user, such as aclinician, physician or patient, can input a query attribute fordetermining whether sets of genetic attributes are equivalent, and thedetermination can then form the basis for identifying attributes thatco-occur with other query attributes in certain individuals (e.g.,attribute combinations that segregate with query-attribute-positiveindividuals) and for use in methods of attribute combination databasecreation, attribute discovery, attribute prediction, side effectprediction, and predisposition (i.e., destiny) modification.

In biological organisms and systems, age and sex type are two somewhatunique and powerful attributes that influence the expression of manyother attributes. For example, age is a primary factor associated with:predicting onset and progression of age-associated diseases in humansand animals; acquiring training and life experiences that lead tosuccess in career, sports and music; and determining life-style choices.Similarly, biological sex type is correlated with profound differencesin expression of physical, behavioral and situational attributes. Theinclusion of accurate data for the age and sex of individuals is veryimportant for acquiring accurate and valid results from the methods ofthe present invention. In one embodiment, specific values of age and sexthat aggregate with a query attribute can be determined by the methodsof the present invention, just as for other attributes, to eitherco-occur or not co-occur in attribute combinations that are associatedwith a query attribute. In one embodiment results of the methods can befiltered according to age and/or sex. In other embodiments a populationor subpopulation can be selected according to age and/or sex(age-matching and/or sex-matching) and then only that subpopulationsubjected to additional processing by methods of the present invention.In another embodiment, an age-matched and/or sex-matched population maybe used to form query-attribute-positive and query-attribute-negativegroups. In another embodiment, the sex and/or age of an individual isused to select a population of age-matched and/or sex-matchedindividuals for creation of an attribute combinations database. Inanother embodiment, the sex and/or age of an individual is used toselect a subpopulation of age-matched and/or sex-matched individuals forcomparison in methods of identifying predisposing attributecombinations, individual predisposition prediction and individualdestiny modification. In another embodiment, summary statistics for ageand/or sex are included with the output results of the methods. Inanother embodiment, summary statistics for age and/or sex are includedwith the output results of the methods when other attributes are omittedor masked for privacy.

Additional embodiments are envisioned which implement a preselection ofindividuals processed by methods of the present invention. In oneembodiment, preselection is a selection or pooling of one or morepopulations or subpopulations of individuals from one or more datasetsor databases based on particular values of attributes such as income,occupation, disease status, zip code or marital status for example.Preselecting populations and subpopulations based on possession of oneor more specified attributes can serve to focus a query on the mostrepresentative population, reduce noise by removing irrelevantindividuals whose attribute data may contribute to increasing error inthe results, and decrease computing time required to execute the methodsby reducing the size of the population to be processed. Also, usingpreselection to define and separate different populations enablescomparison of predisposing attribute combinations toward the same queryattribute between those populations. For example, if two separatesubpopulations are selected—a first population of individuals that earnover $100,000/year and a second population of individuals that earn lessthat $10,000/year—and each subpopulation is processed separately toidentify predisposing attribute combinations for a query attribute ofalcoholism, a comparison of the identities, frequencies of occurrence,and strengths of association of predisposing attribute combinations thatlead to alcoholism in individuals that earn over $100,000 can be madewith those of individuals that earn less than $10,000. In oneembodiment, predisposing attribute combinations that are present in onepreselected population and absent in a second preselected population areidentified. In one embodiment, the frequencies of occurrence and/orstatistical strengths of association of predisposing attributecombinations are compared between two or more preselected populations.In one embodiment, only a single preselected population is selected andprocessed by the methods of the present invention.

Additional embodiments of the methods of the present invention arepossible. In one embodiment, two or more mutually exclusive (having noattributes in common) predisposing attribute combinations for a queryattribute are identified for a single individual and can be tabulatedand presented as output. In one embodiment the query attribute can be anattribute combination, and can be termed a query attribute combination.By submitting a query attribute combination to the methods of thepresent invention, the ability to identify attribute combinations thatpredispose toward other attribute combinations is enabled.

In one embodiment of the methods of the present invention, statisticalmeasures for strength of association of attribute combinations are notstored in a dataset containing the attribute combinations, but rather,are calculated at any time (on as-needed basis) from the frequencies ofoccurrence of the stored attribute combinations. In one embodiment onlya portion of the results from a method of the present invention arepresented, reported or displayed as output. In one embodiment, theresults may be presented as a graphical display or printout includingbut not limited to a 2-dimensional, 3-dimensional or multi-dimensionalaxis, pie-chart, flowchart, bar-graph, histogram, cluster chart,dendrogram, tree or pictogram.

Methods for predisposing attributes identification, predispositionprediction and intelligent destiny modification are subject to error andnoise. A prominent cause of error and noise in methods is bias in theattribute data or in the distribution of the population from which thedata is collected. In one embodiment, bias can manifest as inaccuratefrequencies of occurrence and strengths of association between attributecombinations and query attributes, inaccurate lists of attributesdetermined to co-occur with a query attribute, inaccurate predictions ofan individual's predisposition toward query attributes, and inaccuratelists of modifiable attributes for destiny modification. Bias can resultfrom inaccurate data supplied to methods of the present invention,primarily as a consequence of inaccurate reporting and self-reporting ofattribute data but also as a consequence of collecting attributes frompopulations that are biased, skewed or unrepresentative of theindividual or population for which predisposition predictions aredesired. Error can also result as consequence of faulty attribute datacollection such as misdirected or improperly worded questionnaires.

If bias exists and is left unchecked, it can have different effectsdepending on whether the bias exists with the query attribute, orwhether the bias exists in one or more of the co-occurring attributes ofan attribute combination. At a minimum, the existence of bias in theattribute data or population distribution may result in slightlyinaccurate results for frequency of occurrence of attributes andattribute combinations, and inaccurate statistical strengths ofassociation between attribute combinations and query attributes. Whenbias is present at higher levels, results for frequency of occurrenceand strengths of association can be moderately to highly inaccurate,even producing false positives (Type I Error) and false negatives (TypeII Error), where a false positive is the mistaken identification of anattribute association that actually does not exist (or does not existdifferentially in one population relative to another) and a falsenegative is a failure to identify an attribute association that actuallydoes exist (or exists differentially in one population relative toanother).

For the methods disclosed herein, it is possible to minimize error andnoise by ensuring that accurate (unbiased) attribute data are providedto the methods and that representative populations of individuals areused as the basis for creating attribute combination databases. It isanticipated that some degree of inaccuracy of input data will bepresent. The following disclosure indicates sources of error and noiseand ways to identify, avoid and compensate for inaccurate attribute dataand unrepresentative populations.

Selection bias is a major source of error and refers to bias thatresults from using a population of individuals that are notrepresentative of the population for which results and predictions aredesired. For example, if a query for attribute combinations thatpredispose an individual to becoming a professional basketball player isentered against an attributes combination dataset that was created withan over-representation of professional basketball players relative tothe general population, then smaller attribute combinations that areassociated with both professional basketball players and individualsthat are not professional basketball players will receive artificiallyinflated statistical strengths of association with the query attribute,giving a false impression that one needs fewer predisposing attributesthan are actually required to achieve the goal with a high degree ofprobability. Selection bias is largely under the control of thoseresponsible for collecting attribute profiles for individuals of thepopulation and creating datasets that contain that information.Selecting a misrepresentative set of individuals will obviously resultin selection bias as discussed above. Sending questionnaires to arepresentative set of individuals but failing to receive completedquestionnaires from a particular subpopulation, such as a very busygroup of business professionals who failed to take time to fill out andreturn the questionnaire, will also result in selection bias if thereturned questionnaires are used to complete a database without ensuringthat the set of responses are a balanced and representative set for thepopulation as a whole. Therefore, in one embodiment, administrators ofthe methods disclosed herein use a variety of techniques to ensure thatappropriate and representative populations are used so that selectionbias is not present in the attribute profiles and attribute combinationdatasets used as input data for the methods.

Information bias is the second major class of bias and encompasses errordue to inaccuracies in the collected attribute data. The informationbias class comprises several subclasses including misclassificationbias, interview bias, surveillance bias, surrogate interview bias,recall bias and reporting bias.

Misclassification bias refers to bias resulting from misclassifying anindividual as attribute-positive when they are attribute-negative, orvice-versa. To help eliminate this type of bias, it is possible toassign a null for an attribute in circumstances where an accurate valuefor the attribute cannot be ensured.

Interview bias refers to bias resulting from deriving attributes fromquestions or means of information collection that are not correctlydesigned to obtain accurate attribute values. This type of bias isprimarily under the control of those administrators that design andadminister the various modes of attribute collection, and as such, theycan ensure that the means of attribute collection employed are correctlydesigned and validated for collecting accurate values of the targetedattributes.

Surveillance bias refers to bias that results from more closely or morefrequently monitoring one subpopulation of individuals relative toothers, thereby resulting in collection of more accurate and/or morecomplete attribute data for that subpopulation. This is common in casesof individuals suffering from disease, which results in their constantand close monitoring by experienced professionals who may collect moreaccurate and more complete attribute data about many aspects of theindividual, including trivial, routine and common attributes that arenot restricted to the medical field. An administrator of the methodsdisclosed herein can seek to reduce this bias by either excludingattribute information obtained as a consequence of surveillance bias orby ensuring that equivalent attribute information is provided for allmembers of the representative population used for the methods.

Surrogate interview bias refers to bias that results from obtaininginaccurate attribute information about an individual from a second-handsource such as a friend or relative. For example, when an individualdies, the only source of certain attribute information may be from aparent or spouse of the individual who may have inaccurate perception ormemory of certain attributes of the deceased individual. To help avoidthis type of bias, it is preferable that a surrogate provider ofattribute information be instructed to refrain from providing attributevalues for which they are uncertain and instead assign a null for thoseattributes.

Recall bias refers to enhanced or diminished memory recall of attributevalues in one subpopulation of individuals versus another. This againmay occur in individuals that are subject to extreme situations such aschronic illness, where the individual is much more conscious andattentive to small details of their life and environment to which otherswould pay little attention and therefore not recall as accurately. Thistype of bias results from inaccuracy in self-reporting and can bedifficult to detect and control for. Therefore, to minimize this type ofbias, it is recommended that attempts to collect self-reported data bemade over a period of time in which individuals are aware of attributesthat are being collected and may even keep a record or journal forattributes that are subject to significant recall bias. Also, whenevermore accurate means than self-reporting can be used to collect attributevalues, the more accurate means should be used.

Reporting bias refers to bias resulting from intentionalmisrepresentation of attribute values. This occurs when individualsunderestimate the value for an attribute or underreport or fail toreport an attribute they perceive as undesirable or are in denial over,or alternatively, when they overestimate the value for an attribute oroverreport or invent possession of an attribute they perceive asdesirable. For example, individuals typically knowingly underestimatethe quantity of alcohol they drink, but overestimate the amount of timethey spend exercising. One approach to encourage accurate self-reportingof attribute values can be to allow the individual to control theirattribute profile record and keep their identity masked or anonymous inresults output or during use of their data by others, when creatingattribute combinations databases for example. If bias can be determinedto exist and quantified at least in relative terms, another approach canbe to use mathematical compensation/correction of the attribute valuereported by the individual by multiplying their reported value by acoefficient or numerical adjustment factor in order to obtain anaccurate value. In one embodiment this type of adjustment can beperformed at the time the data is collected. In another embodiment thistype of adjustment can be performed during conversion and reformattingof data by data conversion/formatting engine 220.

In one embodiment data conversion/formatting engine 220 works toward theremoval of biases by the application of rules which assist in theidentification of biased (suspect) attributes. In one embodiment therules cause the insertion of null attributes where the existingattribute is suspect. In an alternate embodiment, rules are applied toidentify suspect attributes (e.g. overreporting of exercise,underreporting of alcohol consumption) and corrective factors areapplied to those attributes. For example, if it is determined that usersself report consumption of alcohol at about ⅓ the actual rate consumed,the rules can, when attributes are suspect, increase the self-reportedattribute by a factor of 1.5-3.0 depending on how the attribute isbelieved to be suspect. In large databases (e.g. health care databases)the size of the database can be used in conjunction with specificinvestigations (detailed data collection on test groups) to help developrules to both identify and address biases.

In an alternate embodiment, actual possession of attributes and accuratevalues for self-reported attributes are determined using a multiprongeddata collection approach wherein multiple different inquires or means ofattribute collection are used to collect a value for an attribute proneto bias. One example of this approach is to employ a questionnaire thatasks multiple different questions to acquire the same attribute value.For example, if one wants to collect the attribute value for the numberof cigarettes a person smokes each week, a questionnaire can include thefollowing questions which are designed to directly or indirectly acquirethis information: “how many cigarettes do you smoke each day?”, “howmany packs of cigarettes do you smoke each day?”, “how many packs ofcigarettes do you smoke each week?”, “how many packs of cigarettes dopurchase each day?, each week?”, “how many cartons of cigarettes do youpurchase each month?”, “how much money do you spend on cigarettes eachday?, each week? each month?”, “how many smoking breaks do you take atwork each day?”. Another example is to ask a person to self-report howmuch time they spend exercising and also collect information from theirgym that shows the time they swipe-in and swipe-out with theirmembership card. In this way, multiple sources of values for anattribute can be obtained and the values compared, cross-validated,deleted, filtered, adjusted, or averaged to help ensure storing accuratevalues for attributes.

In one embodiment the comparison, cross-validation, deletion, filtering,adjusting and averaging of attribute values can be performed duringconversion and reformatting of data by data conversion/formatting engine220. In one embodiment, multiple values obtained for a single attributeare averaged to obtain a final value for the attribute. In oneembodiment, values for an attribute are discarded based on discrepanciesbetween multiple values for an attribute. In one embodiment, one valuefor an attribute is chosen from among multiple values obtained for theattribute based on a comparison of the multiple values. In an alternateembodiment, reported values that appear out of an acceptable range (e.g.statistical outliers) are discarded and the final attribute value isdetermined from the remaining reported values.

Although calculation of the following mathematical measures are notperformed in the examples presented herein, statistical measures ofconfidence including but not limited to variance, standard deviation,confidence intervals, coefficients of variation, correlationcoefficients, residuals, t values (e.g., student's t test, one- andtwo-tailed t-distributions), ANOVA, correlation coefficients (e.g.,regression coefficient, Pearson product-moment correlation coefficient),standard error and p-values can be computed for the results of methodsof the current invention, the computation of which is known to those ofskill in the art. In one embodiment, these confidence measures provide alevel or degree of confidence in the numerical results of the methods sothat the formal, standardized, legal, ethical, business, economic,medical, scientific, or peer-reviewable conclusions and decision-makingcan be made based on the results. In another embodiment, these measuresare computed and compared for frequencies of occurrence of attributecombinations during creation of an attribute combinations database, forexample to determine whether the difference between frequencies ofoccurrence of an attribute combination for the query-attribute-positiveand query-attribute-negative groups is statistically significant for thepurpose, in a further embodiment, of eliminating those attributecombinations that do not have a statistically significant difference infrequency of occurrence between the two populations. Levels ofsignificance and confidence thresholds can be chosen based on userpreference, implementation requirements, or standards of the variousindustries and fields of application.

Aside from the purposes indicated in the above methods, the presentinvention can also be used for investigation of attribute interactionsforming the basis for predisposition. For example, embodiments of themethods can be used to reveal which attributes have diverse andwide-ranging interactions, which attributes have subtle interactions,which attributes have additive effects and which attributes havemultiplicative or exponential synergistic interactions with otherattributes.

In one embodiment, synergistic interactions are particularly importantbecause they have multiplicative or exponential effects onpredisposition, rather than simple additive effects, and can increasepredisposition by many fold, sometimes by as much as 1000 fold. Thesetypes of synergistic interactions are common occurrences in biologicalsystems. For example, synergistic interactions routinely occur withdrugs introduced into biological systems. Depending on thecircumstances, this synergism can lead to beneficial synergisticincreases in drug potency or to synergistic adverse drug reactions.Synergism also occurs in opportunistic infections by microbes. Synergismbetween attributes may also occur in development of physical andbehavioral traits. For example, cigarette smoking and asbestos exposureare known to synergize in multiplicative fashion to cause lung cancer.The same is true for smoking combined with uranium radiation exposure.Exposure to bacterial aflatoxin ingested via farm products combined withchronic hepatitis B infection synergistically causes liver cancer.Revealing synergistic interactions can be invaluable for intelligent andefficient targeting of therapies, treatments, training regimens, andlifestyle alterations to either increase or decrease predispositiontoward an attribute of interest in the most rapid and efficient manner.

FIG. 29A is a representative example of a 3rd dataset resulting from themethod for destiny modification to determine predisposition ofindividual #1 of FIG. 14 toward attribute ‘W’. In contrast, FIG. 29B isa representative example of a 3rd dataset for individual #1 resultingfrom the method for destiny modification to determine predispositiontoward attribute ‘W’ following elimination of attribute ‘A’ from theirattribute profile. By comparing the two datasets, a before and afterlook at the predisposition of individual #1 toward having or developingattribute ‘W’ is provided, where ‘before’ refers to the situation inwhich attribute ‘A’ is still associated with the individual and ‘after’refers to the situation in which attribute ‘A’ is no longer associatedwith the individual. From a comparison of these results, not only is themagnitude of attribute ‘A’ contribution toward predisposition revealed,but synergistic interactions of other attributes with attribute ‘A’ arealso revealed.

In the ‘before’ situation shown in FIG. 29A, the individual possessesthe attribute combination ACE. Addition of association to eitherattribute I, K or Q alone increases absolute risk to 1.0. However, inthe ‘after’ situation of FIG. 29B where the individual begins with thecombination CE, adding association to either attribute I, K or Q alonehas little or no positive effect on predisposition. This reveals that I,K and Q require synergism with A to contribute significantly towardpredisposition to query attribute W in this example. Furthermore,addition of a combination of IQ or IK still has no positive effect onpredisposition in the absence of A. This indicates that I can synergizewith A but not with Q or K. Interestingly, when the combination KQ isadded to the combination CE in the absence of A, absolute risk jumps to1.0. This indicates that K and Q can synergize with each other in thepresence of CE, effectively increasing predisposition to a maximum evenin the absence of attribute A.

In the various embodiments of the present invention, the question as tohow the results are to be used can be considered in the application of aparticular embodiment of the method of attribute identification. Ininstances where the goal is to determine how to reduce predispositiontoward an undesirable attribute for example, then utilizing oneembodiment of the method to determine the identity of predisposingattribute combinations and then proceeding to eliminate an individual'sassociation with those attributes is one way to reduce predispositiontoward that attribute. However, one may also attempt to decreasepredisposition by applying an embodiment of the method to determinethose attribute combinations that are predisposing toward an attributethat is the opposite of the undesirable attribute, and then proceed tointroduce association with those attributes to direct predisposition ofthe individual toward that opposing attribute. In other words, theattributes that predispose toward a key attribute may in many cases notbe simple opposite of attributes that predispose to the opposite of thekey attribute. Approaching this from both angles may provide additionaleffectiveness in achieving the goal of how to most effectively modifypredisposition toward a key attribute of interest. In one embodimentboth approaches are applied simultaneously to increase success inreaching the goal of destiny modification.

Confidentiality of personal attribute data can be a major concern toindividuals that submit their data for analysis. Various embodiments ofthe present invention are envisioned in which the identity of anindividual linked directly or indirectly to their data, or masked, orprovided by privileged access or express permission, including but notlimited to the following embodiments. In one embodiment the identity ofindividuals are linked to their raw attribute profiles. In oneembodiment the identity of individuals are linked directly to their rawattribute profiles. In one embodiment the identity of individuals arelinked indirectly to their raw attribute profiles. In one embodiment theidentity of individuals are anonymously linked to their raw attributeprofiles. In one embodiment the identity of individuals are linked totheir raw attribute profiles using a nondescriptive alphanumericidentifier. In one embodiment the identity of individuals are linked tothe attribute combinations they possess as stored in one or moredatasets of the methods. In one embodiment the linkage of identity isdirect. In one embodiment the linkage of identity is indirect. In oneembodiment the linkage of identity requires anonymizing or masking theidentity of the individual. In one embodiment the linkage of identityrequires use of a nondescriptive alphanumeric identifier.

Various embodiments of the present invention are envisioned in whichdata is made public, or held private, or provided restricted/privilegedaccess granted upon express permission and include but are not limitedto the following embodiments. In one embodiment, the identity ofattributes and statistical results produced in the output of the methodsare provided only to the individual whose attribute profile was accessedfor the query. In one embodiment, the identity of attributes andstatistical results produced in the output of the methods are providedonly to the individual that submitted or authorized the query. In oneembodiment, the identity of attributes and statistical results producedin the output of the methods are provided only to the individualconsumer that paid for the query. In one embodiment, the identity ofattributes and statistical results produced in the output of the methodsare provided only to a commercial organization that submitted,authorized or paid for the query. In one embodiment, the identities ofattributes in the output results from methods of the present inventionare omitted or masked. In one embodiment, the identity of attributes canbe omitted, masked or granted privileged access to by others as dictatedby the individual whose attribute profile was accessed for the query. Inone embodiment, the identity of attributes can be made accessible to agovernment employee, legal professional, medical professional, or otherprofessional legally bound to secrecy. In one embodiment, the identityof attributes can be omitted, masked or granted privileged access to byothers as dictated by a government employee, legal professional, ormedical professional. In one embodiment, the identity of attributes canbe omitted, masked or granted privileged access to by others as dictatedby a commercial organization.

FIG. 30 illustrates a representative computing system on whichembodiments of the present method and system can be implemented. Withrespect to FIG. 30, a Central Processing Unit (CPU) 3000 is connected toa local bus 3002 which is also connected to Random Access Memory (RAM)3004 and disk controller and storage system 3006. CPU 3000 is alsoconnected to an operating system including BIOS 3008 which contains bootcode and which can access disk controller and storage system 3006 toprovide an operational environment and to run an application (e.g.attribute determination). The representative computing system includes agraphics adaptor 3020, display 3030, a wireless unit 3040 (i.e., a datareceiver/transmitter device), a network adapter 3050 that can beconnected to a LAN 3052 (local area network), and an I/O controller 3010that can be connected to a printer 3012, mouse 3014, and keyboard 3016.

It will be appreciated by one of skill in the art that the presentmethods, systems, software and databases can be implemented on a numberof computing platforms, and that FIG. 30 is only a representativecomputing platform, and is not intended to limit the scope of theclaimed invention. For example, multiprocessor units with multiple CPUsor cores can be used, as well as distributed computing platforms inwhich computations are made across a network by a plurality of computingunits working in conjunction using a specified algorithm. The computingplatforms may be fixed or portable, and data collection can be performedby one unit (e.g. a handheld unit) with the collected information beingreported to a fixed workstation or database which is formed by acomputer in conjunction with mass storage. Similarly, a number ofprogramming languages can be used to implement the methods and to createthe systems disclosed herein, those programming languages including butnot limited to C, Java, php, C++, perl, visual basic, sql and otherlanguages which can be used to cause the representative computing systemof FIG. 30 to perform the steps disclosed herein.

With respect to FIG. 31, the interconnection of various computingsystems over a network 3100 to realize an attribute determination system800, such as that of FIG. 8, is illustrated. In one embodiment, consumer810 uses a Personal Computer (PC) 3110 to interface with the system andmore specifically to enter and receive data. Similarly, clinician 820uses a workstation 3130 to interface with the system. Genetic databaseadministrator 830 uses an external genetic database 3150 for the storageof genetic/epigenetic data for large populations. Historical,situational, and behavioral data are all maintained on populationdatabase 3160. All of the aforementioned computing systems areinterconnected via network 3100.

In one embodiment, and as illustrated in FIG. 31, an attributedetermination computing and database platform 3140 is utilized to hostthe software-based components of attribute determination system 800, anddata is collected as illustrated in FIG. 8. Once relevant attributes aredetermined, they can be displayed to consumer 810, clinician 820, orboth. In an alternate embodiment, the software-based components ofattribute determination system 800 can reside on workstation 3130operated by clinician 820. Genetic database administrator 830 may alsomaintain and operate attribute determination system 800 and host itssoftware-based components on external genetic database 3150. Anotherembodiment is also possible in which the software-based components ofthe attribute determination system 800 are distributed across thevarious computing platforms. Similarly, other parties and hostingmachines not illustrated in FIG. 31 may also be used to create attributedetermination system 800.

In one embodiment, the datasets of the methods of the present inventionmay be combined into a single dataset. In another embodiment thedatasets may be kept separated. Separate datasets may be stored on asingle computing device or distributed across a plurality of devices. Assuch, a memory for storing such datasets, while referred to as asingular memory, may in reality be a distributed memory comprising aplurality of separate physical or virtual memory locations distributedover a plurality of devices such as over a computer network. Data,datasets, databases, methods and software of the present invention canbe embodied on a computer-readable media (medium), computer-readablememory (including computer readable memory devices), and program storagedevices readable by a machine.

In one embodiment, at least a portion of the attribute data for one ormore individuals is obtained from medical records. In one embodiment, atleast a portion of the attribute data for one or more individuals isaccessed, retrieved or obtained (directly or indirectly) from acentralized medical records database. In one embodiment, at least aportion of the attribute data for one or more individuals is accessed orretrieved from a centralized medical records database over a computernetwork.

The methods, systems, software and databases disclosed herein have anumber of industrial applications pertaining to the identification ofattributes and combinations of attributes related to a query attribute;creation of databases containing the attributes, attribute combinations,strength of association with the query attribute, and rankings ofstrength of association with the query attribute; and use of theidentified attributes, combinations of attributes, and strength ofassociation of attributes with the query attribute in making a varietyof decisions related to lifestyle, lifestyle modification, diagnosis,medical treatment, eventual outcome (e.g. destiny), possibilities fordestiny modification, and sensitivity analysis (impact of modificationof certain attributes).

In one embodiment the methods, system, software, and databases disclosedherein are used as part of a web based health analysis and diagnosticssystem in which one or more service providers utilize pangeneticinformation (attributes) in conjunction with physical, situational, andbehavioral, attributes to provide services such as longevity analysis,insurance optimization (determination of recommended policies andamounts), and medication impact analysis. In these scenarios, themethods disclosed herein are applied using appropriate query attributesto determine such parameters as the likelihood that the patient willdevelop or has a particular disease, or make an inquiry related tolikelihood of disease development. In one embodiment, the genetic sampleis mailed to an analysis center, where genetic and epigenetic sequencingis performed, and the data stored in an appropriate database. Clinician820 of FIG. 8 or consumer 810 of FIG. 8 provides for reporting of otherdata from which physical, situational, and behavioral attributes aredeveloped and stored. A query related to a diagnosis can be developed byclinician 820 (or other practitioner) and submitted via the web. Usingthe methods and algorithms disclosed herein, a probable diagnosis or setof possible diagnoses can be developed and presented via the webinterface. These diagnoses can be physical or mental. With respect tothe diagnosis of mental illnesses (mental health analyses),identification of key behavioral and situational attributes (e.g.financial attributes, relationship attributes) which may affect mentalhealth is possible using the present methods, systems, software anddatabases. Risk assessments can be performed to indicate what mentalillnesses consumer 810 may be subject to, as well as suggestingmodifications to behavior or living environment to avoid thoseillnesses. For example, a consumer subject to certain types of obsessivedisorders might be advised to change certain behavioral and/orsituational attributes which are associated with that obsessivedisorder, thus decreasing the probability that they will have orexacerbate that disorder.

With respect to general analysis of medical conditions, the web basedsystem can be used to evaluate insurance coverage (amounts and types)and provide recommendations for coverage based on the specific illnessrisks and attributes possessed by the consumer, evaluate the impact (orlack thereof) of lifestyle changes, the impact and effectiveness ofmedications. Such analyses can also be made in view of predispositionpredictions that can indicate probable future development of a disorder,thereby allowing preparations for insurance coverage and therapeuticpreventive measures to be taken in advance of the disorder.

As previously discussed, the system can be used for web based strengthand weakness identification, by allowing the consumer or clinician toquery the system to assess the probability that an individual has aparticular strength or weakness. In one embodiment, parents query thesystem to determine if their child (from which a biological sample wastaken) will have particular strengths (e.g. music or sports) and whatbehavioral attributes should be adopted to maximize the probability ofsuccess at that endeavor, assuming a “natural talent” can be identifiedthrough the combinations of attributes associated with that endeavor.Various service providers, including genetic and epigenetic profilingentities, can interact with the system over a network (e.g., theinternet) and allow the consumer or clinician to interact with thesystem over a network through a web-based interface to obtain thedestiny or attribute information.

In one embodiment a web based goal achievement tool is presented inwhich the consumer enters one or more goals, and the system returnsmodifiable attributes which have been identified using theaforementioned analysis tools, indicating how the consumer can bestobtain the desired goal(s) given their pangenetic, physical,situational, and behavioral makeup.

In one embodiment, potential relationship/life/marriage partners arelocated based on the pangenetic, physical, situational, and behavioralattributes of those individuals, as measured against an attribute modelof a suitable partner developed for the consumer. The attribute model ofthe suitable partner can be developed using a number of techniques,including but not limited to, modeling of partner attributes based onattributes of individuals with which the individual has had previoussuccessful relationships, determination of appropriate “complementary”attributes to the consumer based on statistical studies of individualswith similar attributes to the consumer who are in successfulrelationships and examination of their partner's attributes(determination of appropriate complementary attributes), and an abinitio determination of appropriate partner attributes. Once theattribute model for the most suitable potential partner has beendeveloped, a database containing pangenetic, physical, situational andbehavioral attribute data for potential partners for the consumer can besearched for the purpose of partner identification. In an alternateembodiment a consumer indicates persons they believe have suitablepartner qualities including physical attraction (based on photos orvideo segments) as well as attributes described in profiles associatedwith the persons and their photos. In one embodiment the system usesgenetic and epigenetic information associated with those individuals tocreate a subpopulation of individuals which the consumer believes theyare attracted to, and examines a variety of data associated with thatsubpopulation (e.g., all available attribute data including genetic andepigenetic data) to determine attributes that are indicative ofdesirability to that consumer. In one embodiment the system uses thoseattributes to locate more individuals that could be potentially ofinterest to the consumer and presents those individuals to the consumeras potential partners.

Although the aforementioned methods, systems, software and databaseshave been disclosed as incorporating and utilizing pangenetic, physical,situational and behavioral data, embodiments not utilizing pangeneticinformation are possible, with those embodiments being based solely onphysical, situational and behavioral data. Such embodiments can beutilized to accomplish the tasks disclosed above with respect to theanalysis of biological systems, as well as for the analysis of complexnon-living systems which contain a multitude of attributes. As anexample, a non-biological application of the methodology and systemsdisclosed herein would be for the analysis of complex electrical orelectrical-mechanical systems in order to identify probable failuremechanisms (e.g. attributes leading to failure) and as such increasereliability through the identification of those failure-associatedattributes. Additionally, the aforementioned embodiments are based onthe use of information from multiple attribute categories. Embodimentsin which attribute information from a single attribute category(pangenetic, behavioral, physical, or situational) can be used incircumstances where attributes from a single category dominate in thedevelopment of a condition or outcome.

Embodiments of the present invention can be used for a variety ofmethods, databases, software and systems including but not limited to:pattern recognition; feature extraction; binary search trees and binaryprediction tree modeling; decision trees; neural networks andself-learning systems; belief networks; classification systems;classifier-based systems; clustering algorithms; nondeterministicalgorithms (e.g., Monte Carlo methods); deterministic algorithms;scoring systems; decision-making systems; decision-based trainingsystems; complex supervised learning systems; process control systems;chaos analysis systems; interaction, association and correlation mappingsystems; relational databases; navigation and autopilot systems;communications systems and interfaces; career management; job placementand hiring; dating services; marriage counseling; relationshipevaluation; animal companion compatibility evaluation; livingenvironment evaluation; disease and health management and assessment;genetic assessment and counseling; genetic engineering; genetic linkagestudies; genetic screening; genetic drift and evolution discovery;ancestry investigation; criminal investigation; forensics; criminalprofiling; psychological profiling; adoption placement and planning;fertility and pregnancy evaluation and planning; family planning; socialservices; infrastructure planning; species preservation; organismcloning; organism design and evaluation; apparatus design andevaluation; invention design and evaluation; clinical investigation;epidemiological investigation; etiology investigation; diagnosis,prognosis, treatment, prescription and therapy prediction, formulationand delivery; adverse outcome avoidance (i.e. prophylaxis); data mining;bioinformatics; biomarker development; physiological profiling; rationaldrug design; drug interaction prediction; drug screening; pharmaceuticalformulation; molecular modeling; xenobiotic side-effect prediction;microarray analysis; dietary analysis and recommendation; processedfoods formulation; census evaluation and planning; population dynamicsassessment; ecological and environmental preservation; environmentalhealth; land management; agriculture planning; crisis and disasterprediction, prevention, planning and analysis; pandemic and epidemicprediction, prevention, planning and analysis; weather forecasting; goalformulation and goal achievement assessment; risk assessment;formulating recommendations; asset management; task management;consulting; marketing and advertising; cost analysis; businessdevelopment; economics forecasting and planning; stock marketprediction; lifestyle modification; time management; emergencyintervention; operational/failure status evaluation and prediction;system failure analysis; optimization analysis; architectural design;and product appearance, ergonomics, efficiency, efficacy and reliabilityengineering (i.e., product development).

The embodiments of the present invention may be implemented with anycombination of hardware and software. If implemented as acomputer-implemented apparatus, the present invention is implementedusing means for performing all of the steps and functions disclosedabove.

The embodiments of the present invention can be included in an articleof manufacture (e.g., one or more computer program products) having, forinstance, computer useable media. The media has embodied therein, forinstance, computer readable program code means for providing andfacilitating the mechanisms of the present invention. The article ofmanufacture can be included as part of a computer system or soldseparately.

While specific embodiments have been described in detail in theforegoing detailed description and illustrated in the accompanyingdrawings, it will be appreciated by those skilled in the art thatvarious modifications and alternatives to those details could bedeveloped in light of the overall teachings of the disclosure and thebroad inventive concepts thereof. It is understood, therefore, that thescope of the present invention is not limited to the particular examplesand implementations disclosed herein, but is intended to covermodifications within the spirit and scope thereof as defined by theappended claims and any and all equivalents thereof.

1. A non-transitory computer readable medium for determining coreattributes containing pangenetic attributes and at least one modifiablelifestyle attribute associated with a condition comprising: a.) adatabase of attribute profiles, wherein each attribute profile comprisespangenetic and modifiable lifestyle attributes; and b.) a set ofinstructions that, when executed by a processor, cause that processor toperform the steps of: i.) forming, based on the receipt of a queryrepresenting the condition of a reference individual, a positive querygroup of individuals having the condition, and a negative query group ofindividuals not having the condition; ii.) creating a set of candidateattributes containing pangenetic attributes and at least one modifiablelifestyle attribute based on combinations of pangenetic and at least onemodifiable lifestyle attribute frequently occurring in the positivequery group of individuals but not in the negative query group ofindividuals; and iii.) determining a set of core attributes whichcontain pangenetic and at least one lifestyle attribute wherein the setof core attributes represent commonly occurring attribute profilescontaining pangenetic and at least one modifiable lifestyle attributewhich are statistically associated with the condition and wherein themodifiable lifestyle attribute, when removed, results in a significantdecrease in the correlation of the remaining attributes with thecondition.
 2. The non-transitory computer readable medium of claim 1,further comprising a set of instructions that, when executed by aprocessor, cause that processor to perform the step of iv.) clusteringsets of candidate attributes based on the frequency of occurrence of thepangenetic and at least one modifiable lifestyle attribute to providethe basis for the determination of the core attributes.
 3. Thenon-transitory computer readable medium of claim 1, further comprising aset of instructions that, when executed by a processor, cause thatprocessor to perform the step of iv.) storing the set of core attributesas a rank ordered list wherein the rank ordering corresponds to theranking of modifiable lifestyle attributes which, when significantlyaltered, result in the most significant decrease in correlation of theremaining attributes with the condition.
 4. The non-transitory computerreadable medium of claim 1, further comprising a set of instructionsthat, when executed by a processor, cause that processor to perform thestep of iv.) transmitting the set of core attributes as a list ofpangenetic profiles and modifiable lifestyle attributes.
 5. Thenon-transitory computer readable medium of claim 1, further comprising aset of instructions that, when executed by a processor, cause thatprocessor to perform the step of iv.) receiving an attribute profileassociated with an individual and containing pangenetic attributes andat least one modifiable lifestyle attribute; and v.) determining, basedon attribute profile associated with the individual, the core attributeswhich are highly correlated with the attribute profile associated withthe individual, and which contain lifestyle modifiable attributesassociated with the individual.
 6. The non-transitory computer readablemedium of claim 5, further comprising a set of instructions that, whenexecuted by a processor, cause that processor to perform the step ofvi.) storing the core attributes which are highly correlated with theattribute profile associated with the individual as a rank ordered listwherein the rank ordering corresponds to the ranking of modifiablelifestyle attributes which, when significantly altered, result in themost significant decrease in correlation of the remaining attributeswith the condition.
 7. The non-transitory computer readable medium ofclaim 6, further comprising a set of instructions that, when executed bya processor, cause that processor to perform the step of vii.)transmitting at least a portion of the rank ordered list as a set ofrecommendations for lifestyle modification for the individual to modifythe probability of having the condition